An introduction to the psych package: Part I: data entry ...data frames to long data frames suitable for multilevel modeling. Graphical displays include Scatter Plot Matrix (SPLOM)

Post on 08-Jul-2020

1 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

Transcript

An introduction to the psych package Part I

data entry and data description

William RevelleDepartment of PsychologyNorthwestern University

April 23 2017

Contents

01 Jump starting the psych packagendasha guide for the impatient 302 Psychometric functions are summarized in the second vignette 4

1 Overview of this and related documents 6

2 Getting started 7

3 Basic data analysis 831 Getting the data by using readfile 832 Data input from the clipboard 933 Basic descriptive statistics 10

331 Outlier detection using outlier 11332 Basic data cleaning using scrub 11333 Recoding categorical variables into dummy coded variables 13

34 Simple descriptive graphics 13341 Scatter Plot Matrices 14342 Density or violin plots 14343 Means and error bars 19344 Error bars for tabular data 19345 Two dimensional displays of means and errors 23346 Back to back histograms 25347 Correlational structure 27348 Heatmap displays of correlational structure 28

35 Testing correlations 28

1

36 Polychoric tetrachoric polyserial and biserial correlations 34

4 Multilevel modeling 3441 Decomposing data into within and between level correlations using statsBy 3742 Generating and displaying multilevel data 3743 Factor analysis by groups 38

5 Multiple Regression mediation moderation and set correlations 3851 Multiple regression from data or correlation matrices 3852 Mediation and Moderation analysis 4053 Set Correlation 44

6 Converting output to APA style tables using LATEX 47

7 Miscellaneous functions 49

8 Data sets 50

9 Development version and a users guide 51

10 Psychometric Theory 52

11 SessionInfo 52

2

01 Jump starting the psych packagendasha guide for the impatient

You have installed psych (section 2) and you want to use it without reading much moreWhat should you do

1 Activate the psych package

library(psych)

2 Input your data (section 31) There are two ways to do this

bull Find and read standard files using readfile This will open a search windowfor your operating system which you can use to find the file If the file has asuffix of text txt csv data sav r R rds Rds rda Rda rdata orRData then the file will be opened and the data will be read in

myData lt- readfile() find the appropriate file using your normal operating system

bull Alternatively go to your friendly text editor or data manipulation program(eg Excel) and copy the data to the clipboard Include a first line that has thevariable labels Paste it into psych using the readclipboardtab command

myData lt- readclipboardtab() if on the clipboard

Note that there are number of options for readclipboard for reading in Excelbased files lower triangular files etc

3 Make sure that what you just read is right Describe it (section 33) and perhapslook at the first and last few lines If you have multiple groups try describeBy

dim(myData) What are the dimensions of the data

describe(myData) or

descrbeBy(myDatagroups=mygroups) for descriptive statistics by groups

headTail(myData) show the first and last n lines of a file

4 Look at the patterns in the data If you have fewer than about 12 variables lookat the SPLOM (Scatter Plot Matrix) of the data using pairspanels (section 341)Then use the outlier function to detect outliers

pairspanels(myData)

outlier(myData)

5 Note that you might have some weird subjects probably due to data entry errorsEither edit the data by hand (use the edit command) or just scrub the data (section332)

cleaned lt- scrub(myData max=9) eg change anything great than 9 to NA

6 Graph the data with error bars for each variable (section 343)

errorbars(myData)

3

7 Find the correlations of all of your data lowerCor will by default find the pairwisecorrelations round them to 2 decimals and display the lower off diagonal matrix

bull Descriptively (just the values) (section 347)

r lt- lowerCor(myData) The correlation matrix rounded to 2 decimals

bull Graphically (section 348) Another way is to show a heat map of the correla-tions with the correlation values included

corPlot(r) examine the many options for this function

bull Inferentially (the values the ns and the p values) (section 35)

corrtest(myData)

8 Apply various regression models

Several functions are meant to do multiple regressions either from the raw data orfrom a variancecovariance matrix or a correlation matrix

bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables

myData lt- satact

colnames(myData) lt- c(mod1med1x1x2y1y2)

setCor(y = c( y1 y2) x = c(x1x2) data = myData)

bull mediate will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables mediatedthrough a mediation variable It then tests the mediation effect using a bootstrap

mediate(y = c( y1 y2) x = c(x1x2) m= med1 data = myData)

bull mediate will take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple xvariables mediated through a mediation variable It then tests the mediationeffect using a boot strap

mediate(y = c( y1 y2) x = c(x1x2) m= med1 mod = mod1 data = myData)

02 Psychometric functions are summarized in the second vignette

Many additional functions particularly designed for basic and advanced psychomet-rics are discussed more fully in the Overview Vignette A brief review of the functionsavailable is included here In addition there are helpful tutorials for Finding omegaHow to score scales and find reliability and for Using psych for factor analysis athttppersonality-projectorgr

4

bull Test for the number of factors in your data using parallel analysis (faparallelsection ) or Very Simple Structure (vss )

faparallel(myData)

vss(myData)

bull Factor analyze (see section ) the data with a specified number of factors(the default is 1) the default method is minimum residual the default rotationfor more than one factor is oblimin There are many more possibilities (seesections -) Compare the solution to a hierarchical cluster analysis using theICLUST algorithm (Revelle 1979) (see section ) Also consider a hierarchicalfactor solution to find coefficient ω (see )

fa(myData)

iclust(myData)

omega(myData)

If you prefer to do a principal components analysis you may use the principalfunction The default is one component

principal(myData)

bull Some people like to find coefficient α as an estimate of reliability This may bedone for a single scale using the alpha function (see ) Perhaps more usefulis the ability to create several scales as unweighted averages of specified itemsusing the scoreItems function (see ) and to find various estimates of internalconsistency for these scales find their intercorrelations and find scores for allthe subjects

alpha(myData) score all of the items as part of one scale

myKeys lt- makekeys(nvar=20list(first = c(1-35-7810)second=c(24-61115-16)))

myscores lt- scoreItems(myKeysmyData) form several scales

myscores show the highlights of the results

At this point you have had a chance to see the highlights of the psych package and to dosome basic (and advanced) data analysis You might find reading this entire vignette aswell as the Overview Vignette to be helpful to get a broader understanding of what can bedone in R using the psych Remember that the help command () is available for everyfunction Try running the examples for each help page

5

1 Overview of this and related documents

The psych package (Revelle 2015) has been developed at Northwestern University since2005 to include functions most useful for personality psychometric and psychological re-search The package is also meant to supplement a text on psychometric theory (Revelleprep) a draft of which is available at httppersonality-projectorgrbook

Some of the functions (eg readfile readclipboard describe pairspanels scat-terhist errorbars multihist bibars) are useful for basic data entry and descrip-tive analyses

Psychometric applications emphasize techniques for dimension reduction including factoranalysis cluster analysis and principal components analysis The fa function includesfive methods of factor analysis (minimum residual principal axis weighted least squaresgeneralized least squares and maximum likelihood factor analysis) Principal ComponentsAnalysis (PCA) is also available through the use of the principal or pca functions De-termining the number of factors or components to extract may be done by using the VerySimple Structure (Revelle and Rocklin 1979) (vss) Minimum Average Partial correlation(Velicer 1976) (MAP) or parallel analysis (faparallel) criteria These and several othercriteria are included in the nfactors function Two parameter Item Response Theory(IRT) models for dichotomous or polytomous items may be found by factoring tetra-

choric or polychoric correlation matrices and expressing the resulting parameters interms of location and discrimination using irtfa

Bifactor and hierarchical factor structures may be estimated by using Schmid Leimantransformations (Schmid and Leiman 1957) (schmid) to transform a hierarchical factorstructure into a bifactor solution (Holzinger and Swineford 1937) Higher order modelscan also be found using famulti

Scale construction can be done using the Item Cluster Analysis (Revelle 1979) (iclust)function to determine the structure and to calculate reliability coefficients α (Cronbach1951)(alpha scoreItems scoremultiplechoice) β (Revelle 1979 Revelle and Zin-barg 2009) (iclust) and McDonaldrsquos ωh and ωt (McDonald 1999) (omega) Guttmanrsquos sixestimates of internal consistency reliability (Guttman (1945) as well as additional estimates(Revelle and Zinbarg 2009) are in the guttman function The six measures of Intraclasscorrelation coefficients (ICC) discussed by Shrout and Fleiss (1979) are also available

For data with a a multilevel structure (eg items within subjects across time or itemswithin subjects across groups) the describeBy statsBy functions will give basic descrip-tives by group StatsBy also will find within group (or subject) correlations as well as thebetween group correlation

multilevelreliability mlr will find various generalizability statistics for subjects over

6

time and items mlPlot will graph items over for each subject mlArrange converts widedata frames to long data frames suitable for multilevel modeling

Graphical displays include Scatter Plot Matrix (SPLOM) plots using pairspanels cor-relation ldquoheat mapsrdquo (corPlot) factor cluster and structural diagrams using fadiagramiclustdiagram structurediagram and hetdiagram as well as item response charac-teristics and item and test information characteristic curves plotirt and plotpoly

This vignette is meant to give an overview of the psych package That is it is meantto give a summary of the main functions in the psych package with examples of howthey are used for data description dimension reduction and scale construction The ex-tended user manual at psych_manualpdf includes examples of graphic output and moreextensive demonstrations than are found in the help menus (Also available at http

personality-projectorgrpsych_manualpdf) The vignette psych for sem atpsych_for_sempdf discusses how to use psych as a front end to the sem package of JohnFox (Fox et al 2012) (The vignette is also available at httppersonality-project

orgrbookpsych_for_sempdf)

For a step by step tutorial in the use of the psych package and the base functions inR for basic personality research see the guide for using R for personality research athttppersonalitytheoryorgrrshorthtml For an introduction to psychometrictheory with applications in R see the draft chapters at httppersonality-project

orgrbook)

2 Getting started

Some of the functions described in the Overview Vignette require other packages This isnot the case for the functions listed in this Introduction Particularly useful for rotatingthe results of factor analyses (from eg fa factorminres factorpa factorwlsor principal) or hierarchical factor models using omega or schmid is the GPArotationpackage These and other useful packages may be installed by first installing and thenusing the task views (ctv) package to install the ldquoPsychometricsrdquo task view but doing itthis way is not necessary

installpackages(ctv)

library(ctv)

taskviews(Psychometrics)

The ldquoPsychometricsrdquo task view will install a large number of useful packages To installthe bare minimum for the examples in this vignette it is necessary to install just 3 pack-ages

7

installpackages(list(c(GPArotationmnormt)

Because of the difficulty of installing the package Rgraphviz alternative graphics have beendeveloped and are available as diagram functions If Rgraphviz is available some functionswill take advantage of it An alternative is to useldquodotrdquooutput of commands for any externalgraphics package that uses the dot language

3 Basic data analysis

A number of psych functions facilitate the entry of data and finding basic descriptivestatistics

Remember to run any of the psych functions it is necessary to make the package activeby using the library command

library(psych)

The other packages once installed will be called automatically by psych

It is possible to automatically load psych and other functions by creating and then savinga ldquoFirstrdquo function eg

First lt- function(x) library(psych)

31 Getting the data by using readfile

Although many find copying the data to the clipboard and then using the readclipboardfunctions (see below) a helpful alternative is to read the data in directly This can be doneusing the readfile function which calls filechoose to find the file and then based uponthe suffix of the file chooses the appropriate way to read it For files with suffixes of txttext r rds rda csv xpt or sav the file will be read correctly

mydata lt- readfile()

If the file contains Fixed Width Format (fwf) data the column information can be specifiedwith the widths command

mydata lt- readfile(widths = c(4rep(135)) will read in a file without a header row and 36 fields the first of which is 4 colums the rest of which are 1 column each

If the file is a RData file (with suffix of RData Rda rda Rdata or rdata) the objectwill be loaded Depending what was stored this might be several objects If the file is asav file from SPSS it will be read with the most useful default options (converting the fileto a dataframe and converting character fields to numeric) Alternative options may bespecified If it is an export file from SAS (xpt or XPT) it will be read csv files (comma

8

separated files) normal txt or text files data or dat files will be read as well These areassumed to have a header row of variable labels (header=TRUE) If the data do not havea header row you must specify readfile(header=FALSE)

To read SPSS files and to keep the value labels specify usevaluelabels=TRUE

myspss lt- readfile(usevaluelabels=TRUE) this will keep the value labels for sav files

32 Data input from the clipboard

There are of course many ways to enter data into R Reading from a local file usingreadtable is perhaps the most preferred However many users will enter their datain a text editor or spreadsheet program and then want to copy and paste into R Thismay be done by using readtable and specifying the input file as ldquoclipboardrdquo (PCs) orldquopipe(pbpaste)rdquo (Macs) Alternatively the readclipboard set of functions are perhapsmore user friendly

readclipboard is the base function for reading data from the clipboard

readclipboardcsv for reading text that is comma delimited

readclipboardtab for reading text that is tab delimited (eg copied directly from anExcel file)

readclipboardlower for reading input of a lower triangular matrix with or without adiagonal The resulting object is a square matrix

readclipboardupper for reading input of an upper triangular matrix

readclipboardfwf for reading in fixed width fields (some very old data sets)

For example given a data set copied to the clipboard from a spreadsheet just enter thecommand

mydata lt- readclipboard()

This will work if every data field has a value and even missing data are given some values(eg NA or -999) If the data were entered in a spreadsheet and the missing valueswere just empty cells then the data should be read in as a tab delimited or by using thereadclipboardtab function

gt mydata lt- readclipboard(sep=t) define the tab option or

gt mytabdata lt- readclipboardtab() just use the alternative function

For the case of data in fixed width fields (some old data sets tend to have this format)copy to the clipboard and then specify the width of each field (in the example below the

9

first variable is 5 columns the second is 2 columns the next 5 are 1 column the last 4 are3 columns)

gt mydata lt- readclipboardfwf(widths=c(52rep(15)rep(34))

33 Basic descriptive statistics

Once the data are read in then describe or describeBy will provide basic descriptivestatistics arranged in a data frame format Consider the data set satact which in-cludes data from 700 web based participants on 3 demographic variables and 3 abilitymeasures

describe reports means standard deviations medians min max range skew kurtosisand standard errors for integer or real data Non-numeric data although the statisticsare meaningless will be treated as if numeric (based upon the categorical coding ofthe data) and will be flagged with an

describeBy reports descriptive statistics broken down by some categorizing variable (eggender age etc)

gt library(psych)

gt data(satact)

gt describe(satact) basic descriptive statistics

vars n mean sd median trimmed mad min max range skew

gender 1 700 165 048 2 168 000 1 2 1 -061

education 2 700 316 143 3 331 148 0 5 5 -068

age 3 700 2559 950 22 2386 593 13 65 52 164

ACT 4 700 2855 482 29 2884 445 3 36 33 -066

SATV 5 700 61223 11290 620 61945 11861 200 800 600 -064

SATQ 6 687 61022 11564 620 61725 11861 200 800 600 -059

kurtosis se

gender -162 002

education -007 005

age 242 036

ACT 053 018

SATV 033 427

SATQ -002 441

These data may then be analyzed by groups defined in a logical statement or by some othervariable Eg break down the descriptive data for males or females These descriptivedata can also be seen graphically using the errorbarsby function (Figure 6) By settingskew=FALSE and ranges=FALSE the output is limited to the most basic statistics

gt basic descriptive statistics by a grouping variable

gt describeBy(satactsatact$genderskew=FALSEranges=FALSE)

Descriptive statistics by group

group 1

vars n mean sd se

gender 1 247 100 000 000

10

education 2 247 300 154 010

age 3 247 2586 974 062

ACT 4 247 2879 506 032

SATV 5 247 61511 11416 726

SATQ 6 245 63587 11602 741

------------------------------------------------------------

group 2

vars n mean sd se

gender 1 453 200 000 000

education 2 453 326 135 006

age 3 453 2545 937 044

ACT 4 453 2842 469 022

SATV 5 453 61066 11231 528

SATQ 6 442 59600 11307 538

The output from the describeBy function can be forced into a matrix form for easy analysisby other programs In addition describeBy can group by several grouping variables at thesame time

gt samat lt- describeBy(satactlist(satact$gendersatact$education)

+ skew=FALSEranges=FALSEmat=TRUE)

gt headTail(samat)

item group1 group2 vars n mean sd se

gender1 1 1 0 1 27 1 0 0

gender2 2 2 0 1 30 2 0 0

gender3 3 1 1 1 20 1 0 0

gender4 4 2 1 1 25 2 0 0

ltNAgt ltNAgt ltNAgt

SATQ9 69 1 4 6 51 6359 10412 1458

SATQ10 70 2 4 6 86 59759 10624 1146

SATQ11 71 1 5 6 46 65783 8961 1321

SATQ12 72 2 5 6 93 60672 10555 1095

331 Outlier detection using outlier

One way to detect unusual data is to consider how far each data point is from the mul-tivariate centroid of the data That is find the squared Mahalanobis distance for eachdata point and then compare these to the expected values of χ2 This produces a Q-Q(quantle-quantile) plot with the n most extreme data points labeled (Figure 1) The outliervalues are in the vector d2

332 Basic data cleaning using scrub

If after describing the data it is apparent that there were data entry errors that need tobe globally replaced with NA or only certain ranges of data will be analyzed the data canbe ldquocleanedrdquo using the scrub function

Consider a data set of 10 rows of 12 columns with values from 1 - 120 All values of columns

11

gt png( outlierpng )

gt d2 lt- outlier(satactcex=8)

gt devoff()

null device

1

Figure 1 Using the outlier function to graphically show outliers The y axis is theMahalanobis D2 the X axis is the distribution of χ2 for the same number of degrees offreedom The outliers detected here may be shown graphically using pairspanels (see2 and may be found by sorting d2

12

3 - 5 that are less than 30 40 or 50 respectively or greater than 70 in any of the threecolumns will be replaced with NA In addition any value exactly equal to 45 will be setto NA (max and isvalue are set to one value here but they could be a different value forevery column)

gt x lt- matrix(1120ncol=10byrow=TRUE)

gt colnames(x) lt- paste(V110sep=)gt newx lt- scrub(x35min=c(304050)max=70isvalue=45newvalue=NA)

gt newx

V1 V2 V3 V4 V5 V6 V7 V8 V9 V10

[1] 1 2 NA NA NA 6 7 8 9 10

[2] 11 12 NA NA NA 16 17 18 19 20

[3] 21 22 NA NA NA 26 27 28 29 30

[4] 31 32 33 NA NA 36 37 38 39 40

[5] 41 42 43 44 NA 46 47 48 49 50

[6] 51 52 53 54 55 56 57 58 59 60

[7] 61 62 63 64 65 66 67 68 69 70

[8] 71 72 NA NA NA 76 77 78 79 80

[9] 81 82 NA NA NA 86 87 88 89 90

[10] 91 92 NA NA NA 96 97 98 99 100

[11] 101 102 NA NA NA 106 107 108 109 110

[12] 111 112 NA NA NA 116 117 118 119 120

Note that the number of subjects for those columns has decreased and the minimums havegone up but the maximums down Data cleaning and examination for outliers should be aroutine part of any data analysis

333 Recoding categorical variables into dummy coded variables

Sometimes categorical variables (eg college major occupation ethnicity) are to be ana-lyzed using correlation or regression To do this one can form ldquodummy codesrdquo which aremerely binary variables for each category This may be done using dummycode Subse-quent analyses using these dummy coded variables may be using biserial or point biserial(regular Pearson r) to show effect sizes and may be plotted in eg spider plots

Alternatively sometimes data were coded originally as categorical (MaleFemale HighSchool some College in college etc) and you want to convert these columns of data tonumeric This is done by char2numeric

34 Simple descriptive graphics

Graphic descriptions of data are very helpful both for understanding the data as well ascommunicating important results Scatter Plot Matrices (SPLOMS) using the pairspanelsfunction are useful ways to look for strange effects involving outliers and non-linearitieserrorbarsby will show group means with 95 confidence boundaries By default er-rorbarsby and errorbars will show ldquocats eyesrdquo to graphically show the confidence

13

limits (Figure 6) This may be turned off by specifying eyes=FALSE densityBy or vio-

linBy may be used to show the distribution of the data in ldquoviolinrdquo plots (Figure 5) (Theseare sometimes called ldquolava-lamprdquo plots)

341 Scatter Plot Matrices

Scatter Plot Matrices (SPLOMS) are very useful for describing the data The pairspanelsfunction adapted from the help menu for the pairs function produces xy scatter plots ofeach pair of variables below the diagonal shows the histogram of each variable on thediagonal and shows the lowess locally fit regression line as well An ellipse around themean with the axis length reflecting one standard deviation of the x and y variables is alsodrawn The x axis in each scatter plot represents the column variable the y axis the rowvariable (Figure 2) When plotting many subjects it is both faster and cleaner to set theplot character (pch) to be rsquorsquo (See Figure 2 for an example)

pairspanels will show the pairwise scatter plots of all the variables as well as his-tograms locally smoothed regressions and the Pearson correlation When plottingmany data points (as in the case of the satact data it is possible to specify that theplot character is a period to get a somewhat cleaner graphic However in this figureto show the outliers we use colors and a larger plot character If we want to indicatersquosignificancersquo of the correlations by the conventional use of rsquomagic astricksrsquo we can setthe stars=TRUE option

Another example of pairspanels is to show differences between experimental groupsConsider the data in the affect data set The scores reflect post test scores on positiveand negative affect and energetic and tense arousal The colors show the results for fourmovie conditions depressing frightening movie neutral and a comedy

Yet another demonstration of pairspanels is useful when you have many subjects andwant to show the density of the distributions To do this we will use the makekeys

and scoreItems functions (discussed in the second vignette) to create scales measuringEnergetic Arousal Tense Arousal Positive Affect and Negative Affect (see the msq helpfile) We then show a pairspanels scatter plot matrix where we smooth the data pointsand show the density of the distribution by color

342 Density or violin plots

Graphical presentation of data may be shown using box plots to show the median and 25thand 75th percentiles A powerful alternative is to show the density distribution using theviolinBy function (Figure 5)

14

gt png( pairspanelspng )

gt satd2 lt- dataframe(satactd2) combine the d2 statistics from before with the satact dataframe

gt pairspanels(satd2bg=c(yellowblue)[(d2 gt 25)+1]pch=21stars=TRUE)

gt devoff()

null device

1

Figure 2 Using the pairspanels function to graphically show relationships The x axisin each scatter plot represents the column variable the y axis the row variable Note theextreme outlier for the ACT If the plot character were set to a period (pch=rsquorsquo) it wouldmake a cleaner graphic but in to show the outliers in color we use the plot characters 21and 22

15

gt png(affectpng)gt pairspanels(affect[1417]bg=c(redblackwhiteblue)[affect$Film]pch=21

+ main=Affect varies by movies )

gt devoff()

null device

1

Figure 3 Using the pairspanels function to graphically show relationships The x axis ineach scatter plot represents the column variable the y axis the row variable The coloringrepresent four different movie conditions

16

gt keys lt- makekeys(msq[175]list(

+ EA = c(active energetic vigorous wakeful wideawake fullofpep

+ lively -sleepy -tired -drowsy)

+ TA =c(intense jittery fearful tense clutchedup -quiet -still

+ -placid -calm -atrest)

+ PA =c(active excited strong inspired determined attentive

+ interested enthusiastic proud alert)

+ NAf =c(jittery nervous scared afraid guilty ashamed distressed

+ upset hostile irritable )) )

gt scores lt- scoreItems(keysmsq[175])

gt png(msqpng)gt pairspanels(scores$scoressmoother=TRUE

+ main =Density distributions of four measures of affect )

gt devoff()

null device

1

Figure 4 Using the pairspanels function to graphically show relationships The x axis ineach scatter plot represents the column variable the y axis the row variable The variablesare four measures of motivational state for 3896 participants Each scale is the averagescore of 10 items measuring motivational state Compare this a plot with smoother set toFALSE

17

gt data(satact)

gt violinBy(satact[56]satact$gendergrpname=c(M F)main=Density Plot by gender for SAT V and Q)

Density Plot by gender for SAT V and Q

Obs

erve

d

SATV M SATV F SATQ M SATQ F

200

300

400

500

600

700

800

Figure 5 Using the violinBy function to show the distribution of SAT V and Q for malesand females The plot shows the medians and 25th and 75th percentiles as well as theentire range and the density distribution

18

343 Means and error bars

Additional descriptive graphics include the ability to draw error bars on sets of data aswell as to draw error bars in both the x and y directions for paired data These are thefunctions errorbars errorbarsby errorbarstab and errorcrosses

errorbars show the 95 confidence intervals for each variable in a data frame or ma-trix These errors are based upon normal theory and the standard errors of the meanAlternative options include +- one standard deviation or 1 standard error If thedata are repeated measures the error bars will be reflect the between variable cor-relations By default the confidence intervals are displayed using a ldquocats eyesrdquo plotwhich emphasizes the distribution of confidence within the confidence interval

errorbarsby does the same but grouping the data by some condition

errorbarstab draws bar graphs from tabular data with error bars based upon thestandard error of proportion (σp =

radicpqN)

errorcrosses draw the confidence intervals for an x set and a y set of the same size

The use of the errorbarsby function allows for graphic comparisons of different groups(see Figure 6) Five personality measures are shown as a function of high versus low scoreson a ldquolierdquo scale People with higher lie scores tend to report being more agreeable consci-entious and less neurotic than people with lower lie scores The error bars are based uponnormal theory and thus are symmetric rather than reflect any skewing in the data

Although not recommended it is possible to use the errorbars function to draw bargraphs with associated error bars (This kind of dynamite plot (Figure 8) can be verymisleading in that the scale is arbitrary Go to a discussion of the problems in presentingdata this way at httpemdbolkerwikidotcomblogdynamite In the example shownnote that the graph starts at 0 although is out of the range This is a function of usingbars which always are assumed to start at zero Consider other ways of showing yourdata

344 Error bars for tabular data

However it is sometimes useful to show error bars for tabular data either found by thetable function or just directly input These may be found using the errorbarstab

function

19

gt data(epibfi)

gt errorbarsby(epibfi[610]epibfi$epilielt4)

095 confidence limits

Independent Variable

Dep

ende

nt V

aria

ble

bfagree bfcon bfext bfneur bfopen

050

100

150

Figure 6 Using the errorbarsby function shows that self reported personality scales onthe Big Five Inventory vary as a function of the Lie scale on the EPI The ldquocats eyesrdquo showthe distribution of the confidence

20

gt errorbarsby(satact[56]satact$genderbars=TRUE

+ labels=c(MaleFemale)ylab=SAT scorexlab=)

Male Female

095 confidence limits

SAT

sco

re

200

300

400

500

600

700

800

200

300

400

500

600

700

800

Figure 7 A ldquoDynamite plotrdquo of SAT scores as a function of gender is one way of misleadingthe reader By using a bar graph the range of scores is ignored Bar graphs start from 0

21

gt T lt- with(satacttable(gendereducation))

gt rownames(T) lt- c(MF)

gt errorbarstab(Tway=bothylab=Proportion of Education Levelxlab=Level of Education

+ main=Proportion of sample by education level)

Proportion of sample by education level

Level of Education

Pro

port

ion

of E

duca

tion

Leve

l

000

005

010

015

020

025

030

M 0 M 1 M 2 M 3 M 4 M 5

000

005

010

015

020

025

030

Figure 8 The proportion of each education level that is Male or Female By using theway=rdquobothrdquo option the percentages and errors are based upon the grand total Alterna-tively way=rdquocolumnsrdquo finds column wise percentages way=rdquorowsrdquo finds rowwise percent-ages The data can be converted to percentages (as shown) or by total count (raw=TRUE)The function invisibly returns the probabilities and standard errors See the help menu foran example of entering the data as a dataframe

22

345 Two dimensional displays of means and errors

Yet another way to display data for different conditions is to use the errorCrosses func-tion For instance the effect of various movies on both ldquoEnergetic Arousalrdquo and ldquoTenseArousalrdquo can be seen in one graph and compared to the same movie manipulations onldquoPositive Affectrdquo and ldquoNegative Affectrdquo Note how Energetic Arousal is increased by threeof the movie manipulations but that Positive Affect increases following the Happy movieonly

23

gt op lt- par(mfrow=c(12))

gt data(affect)

gt colors lt- c(blackredwhiteblue)

gt films lt- c(SadHorrorNeutralHappy)

gt affectstats lt- errorCircles(EA2TA2data=affect[-c(120)]group=Filmlabels=films

+ xlab=Energetic Arousal ylab=Tense Arousalylim=c(1022)xlim=c(820)pch=16

+ cex=2colors=colors main = Movies effect on arousal)gt errorCircles(PA2NA2data=affectstatslabels=filmsxlab=Positive Affect

+ ylab=Negative Affect pch=16cex=2colors=colors main =Movies effect on affect)

gt op lt- par(mfrow=c(11))

8 12 16 20

1012

1416

1820

22

Movies effect on arousal

Energetic Arousal

Tens

e A

rous

al

SadHorror

NeutralHappy

6 8 10 12

24

68

10

Movies effect on affect

Positive Affect

Neg

ativ

e A

ffect

Sad

Horror

NeutralHappy

Figure 9 The use of the errorCircles function allows for two dimensional displays ofmeans and error bars The first call to errorCircles finds descriptive statistics for theaffect dataframe based upon the grouping variable of Film These data are returned andthen used by the second call which examines the effect of the same grouping variable upondifferent measures The size of the circles represent the relative sample sizes for each groupThe data are from the PMC lab and reported in Smillie et al (2012)

24

346 Back to back histograms

The bibars function summarize the characteristics of two groups (eg males and females)on a second variable (eg age) by drawing back to back histograms (see Figure 10)

25

data(bfi)gt png( bibarspng )

gt with(bfibibars(agegenderylab=Agemain=Age by males and females))

gt devoff()

null device

1

Figure 10 A bar plot of the age distribution for males and females shows the use ofbibars The data are males and females from 2800 cases collected using the SAPAprocedure and are available as part of the bfi data set

26

347 Correlational structure

There are many ways to display correlations Tabular displays are probably the mostcommon The output from the cor function in core R is a rectangular matrix lowerMat

will round this to (2) digits and then display as a lower off diagonal matrix lowerCor

calls cor with use=lsquopairwisersquo method=lsquopearsonrsquo as default values and returns (invisibly)the full correlation matrix and displays the lower off diagonal matrix

gt lowerCor(satact)

gendr edctn age ACT SATV SATQ

gender 100

education 009 100

age -002 055 100

ACT -004 015 011 100

SATV -002 005 -004 056 100

SATQ -017 003 -003 059 064 100

When comparing results from two different groups it is convenient to display them as onematrix with the results from one group below the diagonal and the other group above thediagonal Use lowerUpper to do this

gt female lt- subset(satactsatact$gender==2)

gt male lt- subset(satactsatact$gender==1)

gt lower lt- lowerCor(male[-1])

edctn age ACT SATV SATQ

education 100

age 061 100

ACT 016 015 100

SATV 002 -006 061 100

SATQ 008 004 060 068 100

gt upper lt- lowerCor(female[-1])

edctn age ACT SATV SATQ

education 100

age 052 100

ACT 016 008 100

SATV 007 -003 053 100

SATQ 003 -009 058 063 100

gt both lt- lowerUpper(lowerupper)

gt round(both2)

education age ACT SATV SATQ

education NA 052 016 007 003

age 061 NA 008 -003 -009

ACT 016 015 NA 053 058

SATV 002 -006 061 NA 063

SATQ 008 004 060 068 NA

It is also possible to compare two matrices by taking their differences and displaying one (be-low the diagonal) and the difference of the second from the first above the diagonal

27

gt diffs lt- lowerUpper(lowerupperdiff=TRUE)

gt round(diffs2)

education age ACT SATV SATQ

education NA 009 000 -005 005

age 061 NA 007 -003 013

ACT 016 015 NA 008 002

SATV 002 -006 061 NA 005

SATQ 008 004 060 068 NA

348 Heatmap displays of correlational structure

Perhaps a better way to see the structure in a correlation matrix is to display a heat mapof the correlations This is just a matrix color coded to represent the magnitude of thecorrelation This is useful when considering the number of factors in a data set Considerthe Thurstone data set which has a clear 3 factor solution (Figure 11) or a simulated dataset of 24 variables with a circumplex structure (Figure 12) The color coding representsa ldquoheat maprdquo of the correlations with darker shades of red representing stronger negativeand darker shades of blue stronger positive correlations As an option the value of thecorrelation can be shown

Yet another way to show structure is to use ldquospiderrdquo plots Particularly if variables areordered in some meaningful way (eg in a circumplex) a spider plot will show this structureeasily This is just a plot of the magnitude of the correlation as a radial line with lengthranging from 0 (for a correlation of -1) to 1 (for a correlation of 1) (See Figure 13)

35 Testing correlations

Correlations are wonderful descriptive statistics of the data but some people like to testwhether these correlations differ from zero or differ from each other The cortest func-tion (in the stats package) will test the significance of a single correlation and the rcorr

function in the Hmisc package will do this for many correlations In the psych packagethe corrtest function reports the correlation (Pearson Spearman or Kendall) betweenall variables in either one or two data frames or matrices as well as the number of obser-vations for each case and the (two-tailed) probability for each correlation Unfortunatelythese probability values have not been corrected for multiple comparisons and so shouldbe taken with a great deal of salt Thus in corrtest and corrp the raw probabilitiesare reported below the diagonal and the probabilities adjusted for multiple comparisonsusing (by default) the Holm correction are reported above the diagonal (Table 1) (See thepadjust function for a discussion of Holm (1979) and other corrections)

Testing the difference between any two correlations can be done using the rtest functionThe function actually does four different tests (based upon an article by Steiger (1980)

28

gt png(corplotpng)gt corPlot(Thurstonenumbers=TRUEupper=FALSEdiag=FALSEmain=9 cognitive variables from Thurstone)

gt devoff()

null device

1

Figure 11 The structure of correlation matrix can be seen more clearly if the variables aregrouped by factor and then the correlations are shown by color By using the rsquonumbersrsquooption the values are displayed as well By default the complete matrix is shown Settingupper=FALSE and diag=FALSE shows a cleaner figure

29

gt png(circplotpng)gt circ lt- simcirc(24)

gt rcirc lt- cor(circ)

gt corPlot(rcircmain=24 variables in a circumplex)gt devoff()

null device

1

Figure 12 Using the corPlot function to show the correlations in a circumplex Correlationsare highest near the diagonal diminish to zero further from the diagonal and the increaseagain towards the corners of the matrix Circumplex structures are common in the studyof affect For circumplex structures it is perhaps useful to show the complete matrix

30

gt png(spiderpng)gt oplt- par(mfrow=c(22))

gt spider(y=c(161218)x=124data=rcircfill=TRUEmain=Spider plot of 24 circumplex variables)

gt op lt- par(mfrow=c(11))

gt devoff()

null device

1

Figure 13 A spider plot can show circumplex structure very clearly Circumplex structuresare common in the study of affect

31

Table 1 The corrtest function reports correlations cell sizes and raw and adjustedprobability values corrp reports the probability values for a correlation matrix Bydefault the adjustment used is that of Holm (1979)gt corrtest(satact)

Callcorrtest(x = satact)

Correlation matrix

gender education age ACT SATV SATQ

gender 100 009 -002 -004 -002 -017

education 009 100 055 015 005 003

age -002 055 100 011 -004 -003

ACT -004 015 011 100 056 059

SATV -002 005 -004 056 100 064

SATQ -017 003 -003 059 064 100

Sample Size

gender education age ACT SATV SATQ

gender 700 700 700 700 700 687

education 700 700 700 700 700 687

age 700 700 700 700 700 687

ACT 700 700 700 700 700 687

SATV 700 700 700 700 700 687

SATQ 687 687 687 687 687 687

Probability values (Entries above the diagonal are adjusted for multiple tests)

gender education age ACT SATV SATQ

gender 000 017 100 100 1 0

education 002 000 000 000 1 1

age 058 000 000 003 1 1

ACT 033 000 000 000 0 0

SATV 062 022 026 000 0 0

SATQ 000 036 037 000 0 0

To see confidence intervals of the correlations print with the short=FALSE option

32

depending upon the input

1) For a sample size n find the t and p value for a single correlation as well as the confidenceinterval

gt rtest(503)

Correlation tests

Callrtest(n = 50 r12 = 03)

Test of significance of a correlation

t value 218 with probability lt 0034

and confidence interval 002 053

2) For sample sizes of n and n2 (n2 = n if not specified) find the z of the difference betweenthe z transformed correlations divided by the standard error of the difference of two zscores

gt rtest(3046)

Correlation tests

Callrtest(n = 30 r12 = 04 r34 = 06)

Test of difference between two independent correlations

z value 099 with probability 032

3) For sample size n and correlations ra= r12 rb= r23 and r13 specified test for thedifference of two dependent correlations (Steiger case A)

gt rtest(103451)

Correlation tests

Call[1] rtest(n = 103 r12 = 04 r23 = 01 r13 = 05 )

Test of difference between two correlated correlations

t value -089 with probability lt 037

4) For sample size n test for the difference between two dependent correlations involvingdifferent variables (Steiger case B)

gt rtest(103567558) steiger Case B

Correlation tests

Callrtest(n = 103 r12 = 05 r34 = 06 r23 = 07 r13 = 05 r14 = 05

r24 = 08)

Test of difference between two dependent correlations

z value -12 with probability 023

To test whether a matrix of correlations differs from what would be expected if the popu-lation correlations were all zero the function cortest follows Steiger (1980) who pointedout that the sum of the squared elements of a correlation matrix or the Fisher z scoreequivalents is distributed as chi square under the null hypothesis that the values are zero(ie elements of the identity matrix) This is particularly useful for examining whethercorrelations in a single matrix differ from zero or for comparing two matrices Althoughobvious cortest can be used to test whether the satact data matrix produces non-zerocorrelations (it does) This is a much more appropriate test when testing whether a residualmatrix differs from zero

gt cortest(satact)

33

Tests of correlation matrices

Callcortest(R1 = satact)

Chi Square value 132542 with df = 15 with probability lt 18e-273

36 Polychoric tetrachoric polyserial and biserial correlations

The Pearson correlation of dichotomous data is also known as the φ coefficient If thedata eg ability items are thought to represent an underlying continuous although latentvariable the φ will underestimate the value of the Pearson applied to these latent variablesOne solution to this problem is to use the tetrachoric correlation which is based uponthe assumption of a bivariate normal distribution that has been cut at certain points Thedrawtetra function demonstrates the process (Figure 14) This is also shown in termsof dichotomizing the bivariate normal density function using the drawcor function (Fig-ure 15) A simple generalization of this to the case of the multiple cuts is the polychoric

correlation

Other estimated correlations based upon the assumption of bivariate normality with cutpoints include the biserial and polyserial correlation

If the data are a mix of continuous polytomous and dichotomous variables the mixedcor

function will calculate the appropriate mixture of Pearson polychoric tetrachoric biserialand polyserial correlations

The correlation matrix resulting from a number of tetrachoric or polychoric correlationmatrix sometimes will not be positive semi-definite This will sometimes happen if thecorrelation matrix is formed by using pair-wise deletion of cases The corsmooth functionwill adjust the smallest eigen values of the correlation matrix to make them positive rescaleall of them to sum to the number of variables and produce aldquosmoothedrdquocorrelation matrixAn example of this problem is a data set of burt which probably had a typo in the originalcorrelation matrix Smoothing the matrix corrects this problem

4 Multilevel modeling

Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular use

34

gt drawtetra()

minus3 minus2 minus1 0 1 2 3

minus3

minus2

minus1

01

23

Y rho = 05phi = 033

X gt τY gt Τ

X lt τY gt Τ

X gt τY lt Τ

X lt τY lt Τ

x

dnor

m(x

)

X gt τ

τ

x1

Y gt Τ

Τ

Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values

35

gt drawcor(expand=20cuts=c(00))

xy

z

Bivariate density rho = 05

Figure 15 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values It isfound (laboriously) by optimizing the fit of the bivariate normal for various values of thecorrelation to the observed cell frequencies

36

is the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups

41 Decomposing data into within and between level correlations usingstatsBy

There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler functionto give some of the basic descriptive statistics for two level models

This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package

rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)

where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means

42 Generating and displaying multilevel data

withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a betweengroup correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data

simmultilevel will generate simulated data with a multilevel structure

The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary

function specifying the variable of interest

37

Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )

43 Factor analysis by groups

Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output

sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)

faBy(sbnfactors=5) find the 5 factor solution for each education level

5 Multiple Regression mediation moderation and set cor-relations

The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices

51 Multiple regression from data or correlation matrices

The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstonecorrelation matrix and find the multiple correlation of the last five variables as a functionof the first 4

gt setCor(y = 59x=14data=Thurstone)

Call setCor(y = 59 x = 14 data = Thurstone)

Multiple Regression from matrix input

Beta weights

FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

Sentences 009 007 025 021 020

Vocabulary 009 017 009 016 -002

SentCompletion 002 005 004 021 008

FirstLetters 058 045 021 008 031

38

Multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

069 063 050 058

LetterGroup

048

multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

048 040 025 034

LetterGroup

023

Multiple Inflation Factor (VIF) = 1(1-SMC) =

Sentences Vocabulary SentCompletion FirstLetters

369 388 300 135

Unweighted multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

059 058 049 058

LetterGroup

045

Unweighted multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

034 034 024 033

LetterGroup

020

Various estimates of between set correlations

Squared Canonical Correlations

[1] 06280 01478 00076 00049

Average squared canonical correlation = 02

Cohens Set Correlation R2 = 069

Unweighted correlation between the two sets = 073

By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed

gt sc lt- setCor(y = 59x=34data=Thurstonez=12)

Call setCor(y = 59 x = 34 data = Thurstone z = 12)

Multiple Regression from matrix input

Beta weights

FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

SentCompletion 002 005 004 021 008

FirstLetters 058 045 021 008 031

Multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

058 046 021 018

LetterGroup

030

39

multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

0331 0210 0043 0032

LetterGroup

0092

Multiple Inflation Factor (VIF) = 1(1-SMC) =

SentCompletion FirstLetters

102 102

Unweighted multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

044 035 017 014

LetterGroup

026

Unweighted multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

019 012 003 002

LetterGroup

007

Various estimates of between set correlations

Squared Canonical Correlations

[1] 0405 0023

Average squared canonical correlation = 021

Cohens Set Correlation R2 = 042

Unweighted correlation between the two sets = 048

gt round(sc$residual2)

FourLetterWords Suffixes LetterSeries Pedigrees

FourLetterWords 052 011 009 006

Suffixes 011 060 -001 001

LetterSeries 009 -001 075 028

Pedigrees 006 001 028 066

LetterGroup 013 003 037 020

LetterGroup

FourLetterWords 013

Suffixes 003

LetterSeries 037

Pedigrees 020

LetterGroup 077

52 Mediation and Moderation analysis

Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping

40

Consider the example from Preacher and Hayes (2004) as analyzed using the mediate

function and the subsequent graphic from mediatediagram The data are found in theexample for mediate

Call mediate(y = SATIS x = THERAPY m = ATTRIB data = sobel)

The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB

Total Direct effect(c) of THERAPY on SATIS = 076 SE = 031 t direct = 25 with probability = 0019

Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t direct = 135 with probability = 019

Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033

Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 004 Upper CI = 069

R2 of model = 031

To see the longer output specify short = FALSE in the print statement

Full output

Total effect estimates (c)

SATIS se t Prob

THERAPY 076 031 25 00186

Direct effect estimates (c)SATIS se t Prob

THERAPY 043 032 135 0190

ATTRIB 040 018 223 0034

a effect estimates

THERAPY se t Prob

ATTRIB 082 03 274 00106

b effect estimates

SATIS se t Prob

ATTRIB 04 018 223 0034

ab effect estimates

SATIS boot sd lower upper

THERAPY 033 032 017 004 069

bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables

setCor(y = c( SATV SATQ) x = c(education age ) data = satact std=TRUE)

bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap

mediate(y = c( SATV ) x = c(education age ) m= ACT data =satactstd=TRUEniter=50)

bull mediate will take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It then tests the mediation effect using a bootstrap The particular example is for demonstration purposes only and shows neithermoderation nor mediation The number of iterations for the boot strap was set to 50

41

gt mediatediagram(preacher)

Mediation model

THERAPY SATIS

ATTRIB

082

c = 076

c = 043

04

Figure 16 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram

42

gt preacher lt- setCor(1c(23)sobelstd=FALSE)

gt setCordiagram(preacher)

Regression Models

THERAPY

ATTRIB

SATIS

043

04

021

Figure 17 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure

43

for speed The default number of boot straps is 5000

53 Set Correlation

An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor

function Set correlation is

R2 = 1minusn

prodi=1

(1minusλi)

where λi is the ith eigen value of the eigen value decomposition of the matrix

R = Rminus1xx RxyRminus1

xx Rminus1xy

Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate

setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data

Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things tonotice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied

For this example the analysis is done on the correlation matrix rather than the rawdata

gt C lt- cov(satactuse=pairwise)

gt model1 lt- lm(ACT~ gender + education + age data=satact)

gt summary(model1)

Call

lm(formula = ACT ~ gender + education + age data = satact)

Residuals

44

Call mediate(y = c(SATQ) x = c(ACT) m = education data = satact

mod = gender niter = 50 std = TRUE)

The DV (Y) was SATQ The IV (X) was ACT gender ACTXgndr The mediating variable(s) = education

Total Direct effect(c) of ACT on SATQ = 058 SE = 003 t direct = 1925 with probability = 0

Direct effect (c) of ACT on SATQ removing education = 059 SE = 003 t direct = 1926 with probability = 0

Indirect effect (ab) of ACT on SATQ through education = -001

Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -002 Upper CI = 0

Total Direct effect(c) of gender on SATQ = -014 SE = 003 t direct = -478 with probability = 21e-06

Direct effect (c) of gender on NA removing education = -014 SE = 003 t direct = -463 with probability = 44e-06

Indirect effect (ab) of gender on SATQ through education = 0

Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -001 Upper CI = 0

Total Direct effect(c) of ACTXgndr on SATQ = 0 SE = 003 t direct = 002 with probability = 099

Direct effect (c) of ACTXgndr on NA removing education = 0 SE = 003 t direct = 001 with probability = 099

Indirect effect (ab) of ACTXgndr on SATQ through education = 0

Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = 0 Upper CI = 0

R2 of model = 037

To see the longer output specify short = FALSE in the print statement

Full output

Total effect estimates (c)

SATQ se t Prob

ACT 058 003 1925 000e+00

gender -014 003 -478 210e-06

ACTXgndr 000 003 002 985e-01

Direct effect estimates (c)SATQ se t Prob

ACT 059 003 1926 000e+00

gender -014 003 -463 437e-06

ACTXgndr 000 003 001 992e-01

a effect estimates

education se t Prob

ACT 016 004 422 277e-05

gender 009 004 250 128e-02

ACTXgndr -001 004 -015 883e-01

b effect estimates

SATQ se t Prob

education -004 003 -145 0147

ab effect estimates

SATQ boot sd lower upper

ACT -001 -001 001 0 0

gender 000 000 000 0 0

ACTXgndr 000 000 000 0 0

Moderation model

ACT

gender

ACTXgndr

SATQ

education016 c = 058

c = 059

009 c = minus014

c = minus014

minus001 c = 0

c = 0

minus004

minus004

minus007

002

Figure 18 Moderated multiple regression requires the raw data

45

Min 1Q Median 3Q Max

-252458 -32133 07769 35921 92630

Coefficients

Estimate Std Error t value Pr(gt|t|)

(Intercept) 2741706 082140 33378 lt 2e-16

gender -048606 037984 -1280 020110

education 047890 015235 3143 000174

age 001623 002278 0712 047650

---

Signif codes 0 0001 001 005 01 1

Residual standard error 4768 on 696 degrees of freedom

Multiple R-squared 00272 Adjusted R-squared 002301

F-statistic 6487 on 3 and 696 DF p-value 00002476

Compare this with the output from setCor

gt compare with sector

gt setCor(c(46)c(13)C nobs=700)

Call setCor(y = c(46) x = c(13) data = C nobs = 700)

Multiple Regression from matrix input

Beta weights

ACT SATV SATQ

gender -005 -003 -018

education 014 010 010

age 003 -010 -009

Multiple R

ACT SATV SATQ

016 010 019

multiple R2

ACT SATV SATQ

00272 00096 00359

Multiple Inflation Factor (VIF) = 1(1-SMC) =

gender education age

101 145 144

Unweighted multiple R

ACT SATV SATQ

015 005 011

Unweighted multiple R2

ACT SATV SATQ

002 000 001

SE of Beta weights

ACT SATV SATQ

gender 018 429 434

education 022 513 518

age 022 511 516

t of Beta Weights

ACT SATV SATQ

gender -027 -001 -004

education 065 002 002

46

age 015 -002 -002

Probability of t lt

ACT SATV SATQ

gender 079 099 097

education 051 098 098

age 088 098 099

Shrunken R2

ACT SATV SATQ

00230 00054 00317

Standard Error of R2

ACT SATV SATQ

00120 00073 00137

F

ACT SATV SATQ

649 226 863

Probability of F lt

ACT SATV SATQ

248e-04 808e-02 124e-05

degrees of freedom of regression

[1] 3 696

Various estimates of between set correlations

Squared Canonical Correlations

[1] 0050 0033 0008

Chisq of canonical correlations

[1] 358 231 56

Average squared canonical correlation = 003

Cohens Set Correlation R2 = 009

Shrunken Set Correlation R2 = 008

F and df of Cohens Set Correlation 726 9 168186

Unweighted correlation between the two sets = 001

Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

6 Converting output to APA style tables using LATEX

Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenient

47

LATEXoutput and finally df2latex converts a generic data frame to LATEX

An example of converting the output from fa to LATEXappears in Table 2

Table 2 fa2latexA factor analysis table from the psych package in R

Variable MR1 MR2 MR3 h2 u2 com

Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

SS loadings 264 186 15

MR1 100 059 054MR2 059 100 052MR3 054 052 100

48

7 Miscellaneous functions

A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

fisherz Convert a correlation to the corresponding Fisher z score

geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

ICC and cohenkappa are typically used to find the reliability for raters

headtail combines the head and tail functions to show the first and last lines of a dataset or output

topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

prep finds the probability of replication for an F t or r and estimate effect size

partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

rangeCorrection will correct correlations for restriction of range

reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

49

superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

8 Data sets

A number of data sets for demonstrating psychometric techniques are included in thepsych package These include six data sets showing a hierarchical factor structure (fivecognitive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2and one from health psychology Reise) One of these (Thurstone) is used as an examplein the sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) or 13 personality inventory scores (epibfi) and14 multiple choice iq items (iqitems) The vegetables example has paired comparisonpreferences for 9 vegetables This is an example of Thurstonian scaling used by Guilford(1954) and Nunnally (1967) Other data sets include cubits peas and heights fromGalton

Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

50

iq 14 multiple choice ability items were included as part of the Synthetic Aperture Person-ality Assessment (SAPA) web based personality assessment project The data from1000 subjects are included here as a demonstration set for scoring multiple choiceinventories and doing basic item statistics

galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to ntroduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

9 Development version and a users guide

The most recent development version is available as a source file at the repository main-tained at httppersonality-projectorgr That version will have removed the mostrecently discovered bugs (but perhaps introduced other yet to be discovered ones) Todownload that version go to the repository httppersonality-projectorgrsrc

contrib and wander around For a Mac this version can be installed directly using theldquoother repositoryrdquo option in the package installer For a PC the zip file for the most recentrelease has been created using the win-builder facility at CRAN The development releasefor the Mac is usually several weeks ahead of the PC development version

Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httppersonality-projectorgrpsych_manualpdf

News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

gt news(Version gt 170package=psych)

51

10 Psychometric Theory

The psych package has been developed to help psychologists do basic research Many ofthe functions were developed to supplement a book (httppersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep)More information about the use of some of the functions may be found in the book

For more extensive discussion of the use of psych in particular and R in general consulthttppersonality-projectorgrrguidehtml A short guide to R

11 SessionInfo

This document was prepared using the following settings

gt sessionInfo()

R Under development (unstable) (2017-03-05 r72309)

Platform x86_64-apple-darwin1340 (64-bit)

Running under macOS Sierra 10124

Matrix products default

BLAS LibraryFrameworksRframeworkVersions34ResourcesliblibRblas0dylib

LAPACK LibraryFrameworksRframeworkVersions34ResourcesliblibRlapackdylib

locale

[1] C

attached base packages

[1] stats graphics grDevices utils datasets methods base

other attached packages

[1] psych_17421

loaded via a namespace (and not attached)

[1] compiler_340 parallel_340 tools_340 foreign_08-67

[5] KernSmooth_223-15 nlme_31-131 mnormt_15-4 grid_340

[9] lattice_020-34

52

References

Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

53

Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

Nunnally J C (1967) Psychometric theory McGraw-Hill New York

54

Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

3rd edition

Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

Revelle W (2015) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 158

Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook ofIndividual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bang

55

for the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

56

Index

affect 14 24alpha 5 6

Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26bifactor 6biserial 13 34blockrandom 49burt 34

char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49circular statistics 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49ctv 7cubits 50

densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49diagram 8drawcor 34drawtetra 34dummycode 13

dynamite plot 19

edit 3epibfi 50error bars 19errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23

fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factor analysis 6factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49

galton 51generalized least squares 6geometricmean 49GPArotation 7guttman 6

harmonicmean 49head 49headtail 49heights 50hetdiagram 7Hmisc 28Holzinger 50

57

ICC 6 49iclust 6iclustdiagram 7Index 49introduction to psychometric theory with ap-

plications in R 7iqitems 50irtfa 6 47irt2latex 47 49

KnitR 47

lavaan 38library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27lowess 14

makekeys 14MAP 6mardia 49maximum likelihood 6mediate 4 41 42mediatediagram 41minimum residual 6mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevel 37multilevelreliability 6multiple regression 38

nfactors 6nlme 37

omega 6 7outlier 3 11 12

padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7principal axis 6psych 3 5ndash8 28 47 49ndash52

R functionaffect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50

58

densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13edit 3epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49head 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47

irt2latex 47 49library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7psych 51psych package

affect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26

59

biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49corsmooth 34cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7

fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47irt2latex 47 49lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12prep 49pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34

60

polyserial 34principal 5ndash7psych 51rtest 28rangeCorrection 49readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

rtest 28

rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50table 19tail 49tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

R package

61

ctv 7GPArotation 7Hmisc 28KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash8 28 47 49ndash52Rgraphviz 8sem 7 50stats 28Sweave 47xtable 47

rtest 28rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49Rgraphviz 8

SAPA 26 50 51satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43sem 7 50set correlation 44setcor 49setCor 4 38 41 44 46 47simmultilevel 37

spider 13stars 14stats 28StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50Sweave 47

table 19tail 49tetrachoric 6 34Thurstone 28 38 50Thurstone33 50topBottom 49

vegetables 50 51violinBy 14 18vss 5 6

weighted least squares 6withinBetween 37

xtable 47

62

  • Jump starting the psych packagendasha guide for the impatient
  • Psychometric functions are summarized in the second vignette
  • Overview of this and related documents
  • Getting started
  • Basic data analysis
    • Getting the data by using readfile
    • Data input from the clipboard
    • Basic descriptive statistics
      • Outlier detection using outlier
      • Basic data cleaning using scrub
      • Recoding categorical variables into dummy coded variables
        • Simple descriptive graphics
          • Scatter Plot Matrices
          • Density or violin plots
          • Means and error bars
          • Error bars for tabular data
          • Two dimensional displays of means and errors
          • Back to back histograms
          • Correlational structure
          • Heatmap displays of correlational structure
            • Testing correlations
            • Polychoric tetrachoric polyserial and biserial correlations
              • Multilevel modeling
                • Decomposing data into within and between level correlations using statsBy
                • Generating and displaying multilevel data
                • Factor analysis by groups
                  • Multiple Regression mediation moderation and set correlations
                    • Multiple regression from data or correlation matrices
                    • Mediation and Moderation analysis
                    • Set Correlation
                      • Converting output to APA style tables using LaTeX
                      • Miscellaneous functions
                      • Data sets
                      • Development version and a users guide
                      • Psychometric Theory
                      • SessionInfo

    36 Polychoric tetrachoric polyserial and biserial correlations 34

    4 Multilevel modeling 3441 Decomposing data into within and between level correlations using statsBy 3742 Generating and displaying multilevel data 3743 Factor analysis by groups 38

    5 Multiple Regression mediation moderation and set correlations 3851 Multiple regression from data or correlation matrices 3852 Mediation and Moderation analysis 4053 Set Correlation 44

    6 Converting output to APA style tables using LATEX 47

    7 Miscellaneous functions 49

    8 Data sets 50

    9 Development version and a users guide 51

    10 Psychometric Theory 52

    11 SessionInfo 52

    2

    01 Jump starting the psych packagendasha guide for the impatient

    You have installed psych (section 2) and you want to use it without reading much moreWhat should you do

    1 Activate the psych package

    library(psych)

    2 Input your data (section 31) There are two ways to do this

    bull Find and read standard files using readfile This will open a search windowfor your operating system which you can use to find the file If the file has asuffix of text txt csv data sav r R rds Rds rda Rda rdata orRData then the file will be opened and the data will be read in

    myData lt- readfile() find the appropriate file using your normal operating system

    bull Alternatively go to your friendly text editor or data manipulation program(eg Excel) and copy the data to the clipboard Include a first line that has thevariable labels Paste it into psych using the readclipboardtab command

    myData lt- readclipboardtab() if on the clipboard

    Note that there are number of options for readclipboard for reading in Excelbased files lower triangular files etc

    3 Make sure that what you just read is right Describe it (section 33) and perhapslook at the first and last few lines If you have multiple groups try describeBy

    dim(myData) What are the dimensions of the data

    describe(myData) or

    descrbeBy(myDatagroups=mygroups) for descriptive statistics by groups

    headTail(myData) show the first and last n lines of a file

    4 Look at the patterns in the data If you have fewer than about 12 variables lookat the SPLOM (Scatter Plot Matrix) of the data using pairspanels (section 341)Then use the outlier function to detect outliers

    pairspanels(myData)

    outlier(myData)

    5 Note that you might have some weird subjects probably due to data entry errorsEither edit the data by hand (use the edit command) or just scrub the data (section332)

    cleaned lt- scrub(myData max=9) eg change anything great than 9 to NA

    6 Graph the data with error bars for each variable (section 343)

    errorbars(myData)

    3

    7 Find the correlations of all of your data lowerCor will by default find the pairwisecorrelations round them to 2 decimals and display the lower off diagonal matrix

    bull Descriptively (just the values) (section 347)

    r lt- lowerCor(myData) The correlation matrix rounded to 2 decimals

    bull Graphically (section 348) Another way is to show a heat map of the correla-tions with the correlation values included

    corPlot(r) examine the many options for this function

    bull Inferentially (the values the ns and the p values) (section 35)

    corrtest(myData)

    8 Apply various regression models

    Several functions are meant to do multiple regressions either from the raw data orfrom a variancecovariance matrix or a correlation matrix

    bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables

    myData lt- satact

    colnames(myData) lt- c(mod1med1x1x2y1y2)

    setCor(y = c( y1 y2) x = c(x1x2) data = myData)

    bull mediate will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables mediatedthrough a mediation variable It then tests the mediation effect using a bootstrap

    mediate(y = c( y1 y2) x = c(x1x2) m= med1 data = myData)

    bull mediate will take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple xvariables mediated through a mediation variable It then tests the mediationeffect using a boot strap

    mediate(y = c( y1 y2) x = c(x1x2) m= med1 mod = mod1 data = myData)

    02 Psychometric functions are summarized in the second vignette

    Many additional functions particularly designed for basic and advanced psychomet-rics are discussed more fully in the Overview Vignette A brief review of the functionsavailable is included here In addition there are helpful tutorials for Finding omegaHow to score scales and find reliability and for Using psych for factor analysis athttppersonality-projectorgr

    4

    bull Test for the number of factors in your data using parallel analysis (faparallelsection ) or Very Simple Structure (vss )

    faparallel(myData)

    vss(myData)

    bull Factor analyze (see section ) the data with a specified number of factors(the default is 1) the default method is minimum residual the default rotationfor more than one factor is oblimin There are many more possibilities (seesections -) Compare the solution to a hierarchical cluster analysis using theICLUST algorithm (Revelle 1979) (see section ) Also consider a hierarchicalfactor solution to find coefficient ω (see )

    fa(myData)

    iclust(myData)

    omega(myData)

    If you prefer to do a principal components analysis you may use the principalfunction The default is one component

    principal(myData)

    bull Some people like to find coefficient α as an estimate of reliability This may bedone for a single scale using the alpha function (see ) Perhaps more usefulis the ability to create several scales as unweighted averages of specified itemsusing the scoreItems function (see ) and to find various estimates of internalconsistency for these scales find their intercorrelations and find scores for allthe subjects

    alpha(myData) score all of the items as part of one scale

    myKeys lt- makekeys(nvar=20list(first = c(1-35-7810)second=c(24-61115-16)))

    myscores lt- scoreItems(myKeysmyData) form several scales

    myscores show the highlights of the results

    At this point you have had a chance to see the highlights of the psych package and to dosome basic (and advanced) data analysis You might find reading this entire vignette aswell as the Overview Vignette to be helpful to get a broader understanding of what can bedone in R using the psych Remember that the help command () is available for everyfunction Try running the examples for each help page

    5

    1 Overview of this and related documents

    The psych package (Revelle 2015) has been developed at Northwestern University since2005 to include functions most useful for personality psychometric and psychological re-search The package is also meant to supplement a text on psychometric theory (Revelleprep) a draft of which is available at httppersonality-projectorgrbook

    Some of the functions (eg readfile readclipboard describe pairspanels scat-terhist errorbars multihist bibars) are useful for basic data entry and descrip-tive analyses

    Psychometric applications emphasize techniques for dimension reduction including factoranalysis cluster analysis and principal components analysis The fa function includesfive methods of factor analysis (minimum residual principal axis weighted least squaresgeneralized least squares and maximum likelihood factor analysis) Principal ComponentsAnalysis (PCA) is also available through the use of the principal or pca functions De-termining the number of factors or components to extract may be done by using the VerySimple Structure (Revelle and Rocklin 1979) (vss) Minimum Average Partial correlation(Velicer 1976) (MAP) or parallel analysis (faparallel) criteria These and several othercriteria are included in the nfactors function Two parameter Item Response Theory(IRT) models for dichotomous or polytomous items may be found by factoring tetra-

    choric or polychoric correlation matrices and expressing the resulting parameters interms of location and discrimination using irtfa

    Bifactor and hierarchical factor structures may be estimated by using Schmid Leimantransformations (Schmid and Leiman 1957) (schmid) to transform a hierarchical factorstructure into a bifactor solution (Holzinger and Swineford 1937) Higher order modelscan also be found using famulti

    Scale construction can be done using the Item Cluster Analysis (Revelle 1979) (iclust)function to determine the structure and to calculate reliability coefficients α (Cronbach1951)(alpha scoreItems scoremultiplechoice) β (Revelle 1979 Revelle and Zin-barg 2009) (iclust) and McDonaldrsquos ωh and ωt (McDonald 1999) (omega) Guttmanrsquos sixestimates of internal consistency reliability (Guttman (1945) as well as additional estimates(Revelle and Zinbarg 2009) are in the guttman function The six measures of Intraclasscorrelation coefficients (ICC) discussed by Shrout and Fleiss (1979) are also available

    For data with a a multilevel structure (eg items within subjects across time or itemswithin subjects across groups) the describeBy statsBy functions will give basic descrip-tives by group StatsBy also will find within group (or subject) correlations as well as thebetween group correlation

    multilevelreliability mlr will find various generalizability statistics for subjects over

    6

    time and items mlPlot will graph items over for each subject mlArrange converts widedata frames to long data frames suitable for multilevel modeling

    Graphical displays include Scatter Plot Matrix (SPLOM) plots using pairspanels cor-relation ldquoheat mapsrdquo (corPlot) factor cluster and structural diagrams using fadiagramiclustdiagram structurediagram and hetdiagram as well as item response charac-teristics and item and test information characteristic curves plotirt and plotpoly

    This vignette is meant to give an overview of the psych package That is it is meantto give a summary of the main functions in the psych package with examples of howthey are used for data description dimension reduction and scale construction The ex-tended user manual at psych_manualpdf includes examples of graphic output and moreextensive demonstrations than are found in the help menus (Also available at http

    personality-projectorgrpsych_manualpdf) The vignette psych for sem atpsych_for_sempdf discusses how to use psych as a front end to the sem package of JohnFox (Fox et al 2012) (The vignette is also available at httppersonality-project

    orgrbookpsych_for_sempdf)

    For a step by step tutorial in the use of the psych package and the base functions inR for basic personality research see the guide for using R for personality research athttppersonalitytheoryorgrrshorthtml For an introduction to psychometrictheory with applications in R see the draft chapters at httppersonality-project

    orgrbook)

    2 Getting started

    Some of the functions described in the Overview Vignette require other packages This isnot the case for the functions listed in this Introduction Particularly useful for rotatingthe results of factor analyses (from eg fa factorminres factorpa factorwlsor principal) or hierarchical factor models using omega or schmid is the GPArotationpackage These and other useful packages may be installed by first installing and thenusing the task views (ctv) package to install the ldquoPsychometricsrdquo task view but doing itthis way is not necessary

    installpackages(ctv)

    library(ctv)

    taskviews(Psychometrics)

    The ldquoPsychometricsrdquo task view will install a large number of useful packages To installthe bare minimum for the examples in this vignette it is necessary to install just 3 pack-ages

    7

    installpackages(list(c(GPArotationmnormt)

    Because of the difficulty of installing the package Rgraphviz alternative graphics have beendeveloped and are available as diagram functions If Rgraphviz is available some functionswill take advantage of it An alternative is to useldquodotrdquooutput of commands for any externalgraphics package that uses the dot language

    3 Basic data analysis

    A number of psych functions facilitate the entry of data and finding basic descriptivestatistics

    Remember to run any of the psych functions it is necessary to make the package activeby using the library command

    library(psych)

    The other packages once installed will be called automatically by psych

    It is possible to automatically load psych and other functions by creating and then savinga ldquoFirstrdquo function eg

    First lt- function(x) library(psych)

    31 Getting the data by using readfile

    Although many find copying the data to the clipboard and then using the readclipboardfunctions (see below) a helpful alternative is to read the data in directly This can be doneusing the readfile function which calls filechoose to find the file and then based uponthe suffix of the file chooses the appropriate way to read it For files with suffixes of txttext r rds rda csv xpt or sav the file will be read correctly

    mydata lt- readfile()

    If the file contains Fixed Width Format (fwf) data the column information can be specifiedwith the widths command

    mydata lt- readfile(widths = c(4rep(135)) will read in a file without a header row and 36 fields the first of which is 4 colums the rest of which are 1 column each

    If the file is a RData file (with suffix of RData Rda rda Rdata or rdata) the objectwill be loaded Depending what was stored this might be several objects If the file is asav file from SPSS it will be read with the most useful default options (converting the fileto a dataframe and converting character fields to numeric) Alternative options may bespecified If it is an export file from SAS (xpt or XPT) it will be read csv files (comma

    8

    separated files) normal txt or text files data or dat files will be read as well These areassumed to have a header row of variable labels (header=TRUE) If the data do not havea header row you must specify readfile(header=FALSE)

    To read SPSS files and to keep the value labels specify usevaluelabels=TRUE

    myspss lt- readfile(usevaluelabels=TRUE) this will keep the value labels for sav files

    32 Data input from the clipboard

    There are of course many ways to enter data into R Reading from a local file usingreadtable is perhaps the most preferred However many users will enter their datain a text editor or spreadsheet program and then want to copy and paste into R Thismay be done by using readtable and specifying the input file as ldquoclipboardrdquo (PCs) orldquopipe(pbpaste)rdquo (Macs) Alternatively the readclipboard set of functions are perhapsmore user friendly

    readclipboard is the base function for reading data from the clipboard

    readclipboardcsv for reading text that is comma delimited

    readclipboardtab for reading text that is tab delimited (eg copied directly from anExcel file)

    readclipboardlower for reading input of a lower triangular matrix with or without adiagonal The resulting object is a square matrix

    readclipboardupper for reading input of an upper triangular matrix

    readclipboardfwf for reading in fixed width fields (some very old data sets)

    For example given a data set copied to the clipboard from a spreadsheet just enter thecommand

    mydata lt- readclipboard()

    This will work if every data field has a value and even missing data are given some values(eg NA or -999) If the data were entered in a spreadsheet and the missing valueswere just empty cells then the data should be read in as a tab delimited or by using thereadclipboardtab function

    gt mydata lt- readclipboard(sep=t) define the tab option or

    gt mytabdata lt- readclipboardtab() just use the alternative function

    For the case of data in fixed width fields (some old data sets tend to have this format)copy to the clipboard and then specify the width of each field (in the example below the

    9

    first variable is 5 columns the second is 2 columns the next 5 are 1 column the last 4 are3 columns)

    gt mydata lt- readclipboardfwf(widths=c(52rep(15)rep(34))

    33 Basic descriptive statistics

    Once the data are read in then describe or describeBy will provide basic descriptivestatistics arranged in a data frame format Consider the data set satact which in-cludes data from 700 web based participants on 3 demographic variables and 3 abilitymeasures

    describe reports means standard deviations medians min max range skew kurtosisand standard errors for integer or real data Non-numeric data although the statisticsare meaningless will be treated as if numeric (based upon the categorical coding ofthe data) and will be flagged with an

    describeBy reports descriptive statistics broken down by some categorizing variable (eggender age etc)

    gt library(psych)

    gt data(satact)

    gt describe(satact) basic descriptive statistics

    vars n mean sd median trimmed mad min max range skew

    gender 1 700 165 048 2 168 000 1 2 1 -061

    education 2 700 316 143 3 331 148 0 5 5 -068

    age 3 700 2559 950 22 2386 593 13 65 52 164

    ACT 4 700 2855 482 29 2884 445 3 36 33 -066

    SATV 5 700 61223 11290 620 61945 11861 200 800 600 -064

    SATQ 6 687 61022 11564 620 61725 11861 200 800 600 -059

    kurtosis se

    gender -162 002

    education -007 005

    age 242 036

    ACT 053 018

    SATV 033 427

    SATQ -002 441

    These data may then be analyzed by groups defined in a logical statement or by some othervariable Eg break down the descriptive data for males or females These descriptivedata can also be seen graphically using the errorbarsby function (Figure 6) By settingskew=FALSE and ranges=FALSE the output is limited to the most basic statistics

    gt basic descriptive statistics by a grouping variable

    gt describeBy(satactsatact$genderskew=FALSEranges=FALSE)

    Descriptive statistics by group

    group 1

    vars n mean sd se

    gender 1 247 100 000 000

    10

    education 2 247 300 154 010

    age 3 247 2586 974 062

    ACT 4 247 2879 506 032

    SATV 5 247 61511 11416 726

    SATQ 6 245 63587 11602 741

    ------------------------------------------------------------

    group 2

    vars n mean sd se

    gender 1 453 200 000 000

    education 2 453 326 135 006

    age 3 453 2545 937 044

    ACT 4 453 2842 469 022

    SATV 5 453 61066 11231 528

    SATQ 6 442 59600 11307 538

    The output from the describeBy function can be forced into a matrix form for easy analysisby other programs In addition describeBy can group by several grouping variables at thesame time

    gt samat lt- describeBy(satactlist(satact$gendersatact$education)

    + skew=FALSEranges=FALSEmat=TRUE)

    gt headTail(samat)

    item group1 group2 vars n mean sd se

    gender1 1 1 0 1 27 1 0 0

    gender2 2 2 0 1 30 2 0 0

    gender3 3 1 1 1 20 1 0 0

    gender4 4 2 1 1 25 2 0 0

    ltNAgt ltNAgt ltNAgt

    SATQ9 69 1 4 6 51 6359 10412 1458

    SATQ10 70 2 4 6 86 59759 10624 1146

    SATQ11 71 1 5 6 46 65783 8961 1321

    SATQ12 72 2 5 6 93 60672 10555 1095

    331 Outlier detection using outlier

    One way to detect unusual data is to consider how far each data point is from the mul-tivariate centroid of the data That is find the squared Mahalanobis distance for eachdata point and then compare these to the expected values of χ2 This produces a Q-Q(quantle-quantile) plot with the n most extreme data points labeled (Figure 1) The outliervalues are in the vector d2

    332 Basic data cleaning using scrub

    If after describing the data it is apparent that there were data entry errors that need tobe globally replaced with NA or only certain ranges of data will be analyzed the data canbe ldquocleanedrdquo using the scrub function

    Consider a data set of 10 rows of 12 columns with values from 1 - 120 All values of columns

    11

    gt png( outlierpng )

    gt d2 lt- outlier(satactcex=8)

    gt devoff()

    null device

    1

    Figure 1 Using the outlier function to graphically show outliers The y axis is theMahalanobis D2 the X axis is the distribution of χ2 for the same number of degrees offreedom The outliers detected here may be shown graphically using pairspanels (see2 and may be found by sorting d2

    12

    3 - 5 that are less than 30 40 or 50 respectively or greater than 70 in any of the threecolumns will be replaced with NA In addition any value exactly equal to 45 will be setto NA (max and isvalue are set to one value here but they could be a different value forevery column)

    gt x lt- matrix(1120ncol=10byrow=TRUE)

    gt colnames(x) lt- paste(V110sep=)gt newx lt- scrub(x35min=c(304050)max=70isvalue=45newvalue=NA)

    gt newx

    V1 V2 V3 V4 V5 V6 V7 V8 V9 V10

    [1] 1 2 NA NA NA 6 7 8 9 10

    [2] 11 12 NA NA NA 16 17 18 19 20

    [3] 21 22 NA NA NA 26 27 28 29 30

    [4] 31 32 33 NA NA 36 37 38 39 40

    [5] 41 42 43 44 NA 46 47 48 49 50

    [6] 51 52 53 54 55 56 57 58 59 60

    [7] 61 62 63 64 65 66 67 68 69 70

    [8] 71 72 NA NA NA 76 77 78 79 80

    [9] 81 82 NA NA NA 86 87 88 89 90

    [10] 91 92 NA NA NA 96 97 98 99 100

    [11] 101 102 NA NA NA 106 107 108 109 110

    [12] 111 112 NA NA NA 116 117 118 119 120

    Note that the number of subjects for those columns has decreased and the minimums havegone up but the maximums down Data cleaning and examination for outliers should be aroutine part of any data analysis

    333 Recoding categorical variables into dummy coded variables

    Sometimes categorical variables (eg college major occupation ethnicity) are to be ana-lyzed using correlation or regression To do this one can form ldquodummy codesrdquo which aremerely binary variables for each category This may be done using dummycode Subse-quent analyses using these dummy coded variables may be using biserial or point biserial(regular Pearson r) to show effect sizes and may be plotted in eg spider plots

    Alternatively sometimes data were coded originally as categorical (MaleFemale HighSchool some College in college etc) and you want to convert these columns of data tonumeric This is done by char2numeric

    34 Simple descriptive graphics

    Graphic descriptions of data are very helpful both for understanding the data as well ascommunicating important results Scatter Plot Matrices (SPLOMS) using the pairspanelsfunction are useful ways to look for strange effects involving outliers and non-linearitieserrorbarsby will show group means with 95 confidence boundaries By default er-rorbarsby and errorbars will show ldquocats eyesrdquo to graphically show the confidence

    13

    limits (Figure 6) This may be turned off by specifying eyes=FALSE densityBy or vio-

    linBy may be used to show the distribution of the data in ldquoviolinrdquo plots (Figure 5) (Theseare sometimes called ldquolava-lamprdquo plots)

    341 Scatter Plot Matrices

    Scatter Plot Matrices (SPLOMS) are very useful for describing the data The pairspanelsfunction adapted from the help menu for the pairs function produces xy scatter plots ofeach pair of variables below the diagonal shows the histogram of each variable on thediagonal and shows the lowess locally fit regression line as well An ellipse around themean with the axis length reflecting one standard deviation of the x and y variables is alsodrawn The x axis in each scatter plot represents the column variable the y axis the rowvariable (Figure 2) When plotting many subjects it is both faster and cleaner to set theplot character (pch) to be rsquorsquo (See Figure 2 for an example)

    pairspanels will show the pairwise scatter plots of all the variables as well as his-tograms locally smoothed regressions and the Pearson correlation When plottingmany data points (as in the case of the satact data it is possible to specify that theplot character is a period to get a somewhat cleaner graphic However in this figureto show the outliers we use colors and a larger plot character If we want to indicatersquosignificancersquo of the correlations by the conventional use of rsquomagic astricksrsquo we can setthe stars=TRUE option

    Another example of pairspanels is to show differences between experimental groupsConsider the data in the affect data set The scores reflect post test scores on positiveand negative affect and energetic and tense arousal The colors show the results for fourmovie conditions depressing frightening movie neutral and a comedy

    Yet another demonstration of pairspanels is useful when you have many subjects andwant to show the density of the distributions To do this we will use the makekeys

    and scoreItems functions (discussed in the second vignette) to create scales measuringEnergetic Arousal Tense Arousal Positive Affect and Negative Affect (see the msq helpfile) We then show a pairspanels scatter plot matrix where we smooth the data pointsand show the density of the distribution by color

    342 Density or violin plots

    Graphical presentation of data may be shown using box plots to show the median and 25thand 75th percentiles A powerful alternative is to show the density distribution using theviolinBy function (Figure 5)

    14

    gt png( pairspanelspng )

    gt satd2 lt- dataframe(satactd2) combine the d2 statistics from before with the satact dataframe

    gt pairspanels(satd2bg=c(yellowblue)[(d2 gt 25)+1]pch=21stars=TRUE)

    gt devoff()

    null device

    1

    Figure 2 Using the pairspanels function to graphically show relationships The x axisin each scatter plot represents the column variable the y axis the row variable Note theextreme outlier for the ACT If the plot character were set to a period (pch=rsquorsquo) it wouldmake a cleaner graphic but in to show the outliers in color we use the plot characters 21and 22

    15

    gt png(affectpng)gt pairspanels(affect[1417]bg=c(redblackwhiteblue)[affect$Film]pch=21

    + main=Affect varies by movies )

    gt devoff()

    null device

    1

    Figure 3 Using the pairspanels function to graphically show relationships The x axis ineach scatter plot represents the column variable the y axis the row variable The coloringrepresent four different movie conditions

    16

    gt keys lt- makekeys(msq[175]list(

    + EA = c(active energetic vigorous wakeful wideawake fullofpep

    + lively -sleepy -tired -drowsy)

    + TA =c(intense jittery fearful tense clutchedup -quiet -still

    + -placid -calm -atrest)

    + PA =c(active excited strong inspired determined attentive

    + interested enthusiastic proud alert)

    + NAf =c(jittery nervous scared afraid guilty ashamed distressed

    + upset hostile irritable )) )

    gt scores lt- scoreItems(keysmsq[175])

    gt png(msqpng)gt pairspanels(scores$scoressmoother=TRUE

    + main =Density distributions of four measures of affect )

    gt devoff()

    null device

    1

    Figure 4 Using the pairspanels function to graphically show relationships The x axis ineach scatter plot represents the column variable the y axis the row variable The variablesare four measures of motivational state for 3896 participants Each scale is the averagescore of 10 items measuring motivational state Compare this a plot with smoother set toFALSE

    17

    gt data(satact)

    gt violinBy(satact[56]satact$gendergrpname=c(M F)main=Density Plot by gender for SAT V and Q)

    Density Plot by gender for SAT V and Q

    Obs

    erve

    d

    SATV M SATV F SATQ M SATQ F

    200

    300

    400

    500

    600

    700

    800

    Figure 5 Using the violinBy function to show the distribution of SAT V and Q for malesand females The plot shows the medians and 25th and 75th percentiles as well as theentire range and the density distribution

    18

    343 Means and error bars

    Additional descriptive graphics include the ability to draw error bars on sets of data aswell as to draw error bars in both the x and y directions for paired data These are thefunctions errorbars errorbarsby errorbarstab and errorcrosses

    errorbars show the 95 confidence intervals for each variable in a data frame or ma-trix These errors are based upon normal theory and the standard errors of the meanAlternative options include +- one standard deviation or 1 standard error If thedata are repeated measures the error bars will be reflect the between variable cor-relations By default the confidence intervals are displayed using a ldquocats eyesrdquo plotwhich emphasizes the distribution of confidence within the confidence interval

    errorbarsby does the same but grouping the data by some condition

    errorbarstab draws bar graphs from tabular data with error bars based upon thestandard error of proportion (σp =

    radicpqN)

    errorcrosses draw the confidence intervals for an x set and a y set of the same size

    The use of the errorbarsby function allows for graphic comparisons of different groups(see Figure 6) Five personality measures are shown as a function of high versus low scoreson a ldquolierdquo scale People with higher lie scores tend to report being more agreeable consci-entious and less neurotic than people with lower lie scores The error bars are based uponnormal theory and thus are symmetric rather than reflect any skewing in the data

    Although not recommended it is possible to use the errorbars function to draw bargraphs with associated error bars (This kind of dynamite plot (Figure 8) can be verymisleading in that the scale is arbitrary Go to a discussion of the problems in presentingdata this way at httpemdbolkerwikidotcomblogdynamite In the example shownnote that the graph starts at 0 although is out of the range This is a function of usingbars which always are assumed to start at zero Consider other ways of showing yourdata

    344 Error bars for tabular data

    However it is sometimes useful to show error bars for tabular data either found by thetable function or just directly input These may be found using the errorbarstab

    function

    19

    gt data(epibfi)

    gt errorbarsby(epibfi[610]epibfi$epilielt4)

    095 confidence limits

    Independent Variable

    Dep

    ende

    nt V

    aria

    ble

    bfagree bfcon bfext bfneur bfopen

    050

    100

    150

    Figure 6 Using the errorbarsby function shows that self reported personality scales onthe Big Five Inventory vary as a function of the Lie scale on the EPI The ldquocats eyesrdquo showthe distribution of the confidence

    20

    gt errorbarsby(satact[56]satact$genderbars=TRUE

    + labels=c(MaleFemale)ylab=SAT scorexlab=)

    Male Female

    095 confidence limits

    SAT

    sco

    re

    200

    300

    400

    500

    600

    700

    800

    200

    300

    400

    500

    600

    700

    800

    Figure 7 A ldquoDynamite plotrdquo of SAT scores as a function of gender is one way of misleadingthe reader By using a bar graph the range of scores is ignored Bar graphs start from 0

    21

    gt T lt- with(satacttable(gendereducation))

    gt rownames(T) lt- c(MF)

    gt errorbarstab(Tway=bothylab=Proportion of Education Levelxlab=Level of Education

    + main=Proportion of sample by education level)

    Proportion of sample by education level

    Level of Education

    Pro

    port

    ion

    of E

    duca

    tion

    Leve

    l

    000

    005

    010

    015

    020

    025

    030

    M 0 M 1 M 2 M 3 M 4 M 5

    000

    005

    010

    015

    020

    025

    030

    Figure 8 The proportion of each education level that is Male or Female By using theway=rdquobothrdquo option the percentages and errors are based upon the grand total Alterna-tively way=rdquocolumnsrdquo finds column wise percentages way=rdquorowsrdquo finds rowwise percent-ages The data can be converted to percentages (as shown) or by total count (raw=TRUE)The function invisibly returns the probabilities and standard errors See the help menu foran example of entering the data as a dataframe

    22

    345 Two dimensional displays of means and errors

    Yet another way to display data for different conditions is to use the errorCrosses func-tion For instance the effect of various movies on both ldquoEnergetic Arousalrdquo and ldquoTenseArousalrdquo can be seen in one graph and compared to the same movie manipulations onldquoPositive Affectrdquo and ldquoNegative Affectrdquo Note how Energetic Arousal is increased by threeof the movie manipulations but that Positive Affect increases following the Happy movieonly

    23

    gt op lt- par(mfrow=c(12))

    gt data(affect)

    gt colors lt- c(blackredwhiteblue)

    gt films lt- c(SadHorrorNeutralHappy)

    gt affectstats lt- errorCircles(EA2TA2data=affect[-c(120)]group=Filmlabels=films

    + xlab=Energetic Arousal ylab=Tense Arousalylim=c(1022)xlim=c(820)pch=16

    + cex=2colors=colors main = Movies effect on arousal)gt errorCircles(PA2NA2data=affectstatslabels=filmsxlab=Positive Affect

    + ylab=Negative Affect pch=16cex=2colors=colors main =Movies effect on affect)

    gt op lt- par(mfrow=c(11))

    8 12 16 20

    1012

    1416

    1820

    22

    Movies effect on arousal

    Energetic Arousal

    Tens

    e A

    rous

    al

    SadHorror

    NeutralHappy

    6 8 10 12

    24

    68

    10

    Movies effect on affect

    Positive Affect

    Neg

    ativ

    e A

    ffect

    Sad

    Horror

    NeutralHappy

    Figure 9 The use of the errorCircles function allows for two dimensional displays ofmeans and error bars The first call to errorCircles finds descriptive statistics for theaffect dataframe based upon the grouping variable of Film These data are returned andthen used by the second call which examines the effect of the same grouping variable upondifferent measures The size of the circles represent the relative sample sizes for each groupThe data are from the PMC lab and reported in Smillie et al (2012)

    24

    346 Back to back histograms

    The bibars function summarize the characteristics of two groups (eg males and females)on a second variable (eg age) by drawing back to back histograms (see Figure 10)

    25

    data(bfi)gt png( bibarspng )

    gt with(bfibibars(agegenderylab=Agemain=Age by males and females))

    gt devoff()

    null device

    1

    Figure 10 A bar plot of the age distribution for males and females shows the use ofbibars The data are males and females from 2800 cases collected using the SAPAprocedure and are available as part of the bfi data set

    26

    347 Correlational structure

    There are many ways to display correlations Tabular displays are probably the mostcommon The output from the cor function in core R is a rectangular matrix lowerMat

    will round this to (2) digits and then display as a lower off diagonal matrix lowerCor

    calls cor with use=lsquopairwisersquo method=lsquopearsonrsquo as default values and returns (invisibly)the full correlation matrix and displays the lower off diagonal matrix

    gt lowerCor(satact)

    gendr edctn age ACT SATV SATQ

    gender 100

    education 009 100

    age -002 055 100

    ACT -004 015 011 100

    SATV -002 005 -004 056 100

    SATQ -017 003 -003 059 064 100

    When comparing results from two different groups it is convenient to display them as onematrix with the results from one group below the diagonal and the other group above thediagonal Use lowerUpper to do this

    gt female lt- subset(satactsatact$gender==2)

    gt male lt- subset(satactsatact$gender==1)

    gt lower lt- lowerCor(male[-1])

    edctn age ACT SATV SATQ

    education 100

    age 061 100

    ACT 016 015 100

    SATV 002 -006 061 100

    SATQ 008 004 060 068 100

    gt upper lt- lowerCor(female[-1])

    edctn age ACT SATV SATQ

    education 100

    age 052 100

    ACT 016 008 100

    SATV 007 -003 053 100

    SATQ 003 -009 058 063 100

    gt both lt- lowerUpper(lowerupper)

    gt round(both2)

    education age ACT SATV SATQ

    education NA 052 016 007 003

    age 061 NA 008 -003 -009

    ACT 016 015 NA 053 058

    SATV 002 -006 061 NA 063

    SATQ 008 004 060 068 NA

    It is also possible to compare two matrices by taking their differences and displaying one (be-low the diagonal) and the difference of the second from the first above the diagonal

    27

    gt diffs lt- lowerUpper(lowerupperdiff=TRUE)

    gt round(diffs2)

    education age ACT SATV SATQ

    education NA 009 000 -005 005

    age 061 NA 007 -003 013

    ACT 016 015 NA 008 002

    SATV 002 -006 061 NA 005

    SATQ 008 004 060 068 NA

    348 Heatmap displays of correlational structure

    Perhaps a better way to see the structure in a correlation matrix is to display a heat mapof the correlations This is just a matrix color coded to represent the magnitude of thecorrelation This is useful when considering the number of factors in a data set Considerthe Thurstone data set which has a clear 3 factor solution (Figure 11) or a simulated dataset of 24 variables with a circumplex structure (Figure 12) The color coding representsa ldquoheat maprdquo of the correlations with darker shades of red representing stronger negativeand darker shades of blue stronger positive correlations As an option the value of thecorrelation can be shown

    Yet another way to show structure is to use ldquospiderrdquo plots Particularly if variables areordered in some meaningful way (eg in a circumplex) a spider plot will show this structureeasily This is just a plot of the magnitude of the correlation as a radial line with lengthranging from 0 (for a correlation of -1) to 1 (for a correlation of 1) (See Figure 13)

    35 Testing correlations

    Correlations are wonderful descriptive statistics of the data but some people like to testwhether these correlations differ from zero or differ from each other The cortest func-tion (in the stats package) will test the significance of a single correlation and the rcorr

    function in the Hmisc package will do this for many correlations In the psych packagethe corrtest function reports the correlation (Pearson Spearman or Kendall) betweenall variables in either one or two data frames or matrices as well as the number of obser-vations for each case and the (two-tailed) probability for each correlation Unfortunatelythese probability values have not been corrected for multiple comparisons and so shouldbe taken with a great deal of salt Thus in corrtest and corrp the raw probabilitiesare reported below the diagonal and the probabilities adjusted for multiple comparisonsusing (by default) the Holm correction are reported above the diagonal (Table 1) (See thepadjust function for a discussion of Holm (1979) and other corrections)

    Testing the difference between any two correlations can be done using the rtest functionThe function actually does four different tests (based upon an article by Steiger (1980)

    28

    gt png(corplotpng)gt corPlot(Thurstonenumbers=TRUEupper=FALSEdiag=FALSEmain=9 cognitive variables from Thurstone)

    gt devoff()

    null device

    1

    Figure 11 The structure of correlation matrix can be seen more clearly if the variables aregrouped by factor and then the correlations are shown by color By using the rsquonumbersrsquooption the values are displayed as well By default the complete matrix is shown Settingupper=FALSE and diag=FALSE shows a cleaner figure

    29

    gt png(circplotpng)gt circ lt- simcirc(24)

    gt rcirc lt- cor(circ)

    gt corPlot(rcircmain=24 variables in a circumplex)gt devoff()

    null device

    1

    Figure 12 Using the corPlot function to show the correlations in a circumplex Correlationsare highest near the diagonal diminish to zero further from the diagonal and the increaseagain towards the corners of the matrix Circumplex structures are common in the studyof affect For circumplex structures it is perhaps useful to show the complete matrix

    30

    gt png(spiderpng)gt oplt- par(mfrow=c(22))

    gt spider(y=c(161218)x=124data=rcircfill=TRUEmain=Spider plot of 24 circumplex variables)

    gt op lt- par(mfrow=c(11))

    gt devoff()

    null device

    1

    Figure 13 A spider plot can show circumplex structure very clearly Circumplex structuresare common in the study of affect

    31

    Table 1 The corrtest function reports correlations cell sizes and raw and adjustedprobability values corrp reports the probability values for a correlation matrix Bydefault the adjustment used is that of Holm (1979)gt corrtest(satact)

    Callcorrtest(x = satact)

    Correlation matrix

    gender education age ACT SATV SATQ

    gender 100 009 -002 -004 -002 -017

    education 009 100 055 015 005 003

    age -002 055 100 011 -004 -003

    ACT -004 015 011 100 056 059

    SATV -002 005 -004 056 100 064

    SATQ -017 003 -003 059 064 100

    Sample Size

    gender education age ACT SATV SATQ

    gender 700 700 700 700 700 687

    education 700 700 700 700 700 687

    age 700 700 700 700 700 687

    ACT 700 700 700 700 700 687

    SATV 700 700 700 700 700 687

    SATQ 687 687 687 687 687 687

    Probability values (Entries above the diagonal are adjusted for multiple tests)

    gender education age ACT SATV SATQ

    gender 000 017 100 100 1 0

    education 002 000 000 000 1 1

    age 058 000 000 003 1 1

    ACT 033 000 000 000 0 0

    SATV 062 022 026 000 0 0

    SATQ 000 036 037 000 0 0

    To see confidence intervals of the correlations print with the short=FALSE option

    32

    depending upon the input

    1) For a sample size n find the t and p value for a single correlation as well as the confidenceinterval

    gt rtest(503)

    Correlation tests

    Callrtest(n = 50 r12 = 03)

    Test of significance of a correlation

    t value 218 with probability lt 0034

    and confidence interval 002 053

    2) For sample sizes of n and n2 (n2 = n if not specified) find the z of the difference betweenthe z transformed correlations divided by the standard error of the difference of two zscores

    gt rtest(3046)

    Correlation tests

    Callrtest(n = 30 r12 = 04 r34 = 06)

    Test of difference between two independent correlations

    z value 099 with probability 032

    3) For sample size n and correlations ra= r12 rb= r23 and r13 specified test for thedifference of two dependent correlations (Steiger case A)

    gt rtest(103451)

    Correlation tests

    Call[1] rtest(n = 103 r12 = 04 r23 = 01 r13 = 05 )

    Test of difference between two correlated correlations

    t value -089 with probability lt 037

    4) For sample size n test for the difference between two dependent correlations involvingdifferent variables (Steiger case B)

    gt rtest(103567558) steiger Case B

    Correlation tests

    Callrtest(n = 103 r12 = 05 r34 = 06 r23 = 07 r13 = 05 r14 = 05

    r24 = 08)

    Test of difference between two dependent correlations

    z value -12 with probability 023

    To test whether a matrix of correlations differs from what would be expected if the popu-lation correlations were all zero the function cortest follows Steiger (1980) who pointedout that the sum of the squared elements of a correlation matrix or the Fisher z scoreequivalents is distributed as chi square under the null hypothesis that the values are zero(ie elements of the identity matrix) This is particularly useful for examining whethercorrelations in a single matrix differ from zero or for comparing two matrices Althoughobvious cortest can be used to test whether the satact data matrix produces non-zerocorrelations (it does) This is a much more appropriate test when testing whether a residualmatrix differs from zero

    gt cortest(satact)

    33

    Tests of correlation matrices

    Callcortest(R1 = satact)

    Chi Square value 132542 with df = 15 with probability lt 18e-273

    36 Polychoric tetrachoric polyserial and biserial correlations

    The Pearson correlation of dichotomous data is also known as the φ coefficient If thedata eg ability items are thought to represent an underlying continuous although latentvariable the φ will underestimate the value of the Pearson applied to these latent variablesOne solution to this problem is to use the tetrachoric correlation which is based uponthe assumption of a bivariate normal distribution that has been cut at certain points Thedrawtetra function demonstrates the process (Figure 14) This is also shown in termsof dichotomizing the bivariate normal density function using the drawcor function (Fig-ure 15) A simple generalization of this to the case of the multiple cuts is the polychoric

    correlation

    Other estimated correlations based upon the assumption of bivariate normality with cutpoints include the biserial and polyserial correlation

    If the data are a mix of continuous polytomous and dichotomous variables the mixedcor

    function will calculate the appropriate mixture of Pearson polychoric tetrachoric biserialand polyserial correlations

    The correlation matrix resulting from a number of tetrachoric or polychoric correlationmatrix sometimes will not be positive semi-definite This will sometimes happen if thecorrelation matrix is formed by using pair-wise deletion of cases The corsmooth functionwill adjust the smallest eigen values of the correlation matrix to make them positive rescaleall of them to sum to the number of variables and produce aldquosmoothedrdquocorrelation matrixAn example of this problem is a data set of burt which probably had a typo in the originalcorrelation matrix Smoothing the matrix corrects this problem

    4 Multilevel modeling

    Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular use

    34

    gt drawtetra()

    minus3 minus2 minus1 0 1 2 3

    minus3

    minus2

    minus1

    01

    23

    Y rho = 05phi = 033

    X gt τY gt Τ

    X lt τY gt Τ

    X gt τY lt Τ

    X lt τY lt Τ

    x

    dnor

    m(x

    )

    X gt τ

    τ

    x1

    Y gt Τ

    Τ

    Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values

    35

    gt drawcor(expand=20cuts=c(00))

    xy

    z

    Bivariate density rho = 05

    Figure 15 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values It isfound (laboriously) by optimizing the fit of the bivariate normal for various values of thecorrelation to the observed cell frequencies

    36

    is the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups

    41 Decomposing data into within and between level correlations usingstatsBy

    There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler functionto give some of the basic descriptive statistics for two level models

    This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package

    rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)

    where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means

    42 Generating and displaying multilevel data

    withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a betweengroup correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data

    simmultilevel will generate simulated data with a multilevel structure

    The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary

    function specifying the variable of interest

    37

    Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )

    43 Factor analysis by groups

    Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output

    sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)

    faBy(sbnfactors=5) find the 5 factor solution for each education level

    5 Multiple Regression mediation moderation and set cor-relations

    The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices

    51 Multiple regression from data or correlation matrices

    The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstonecorrelation matrix and find the multiple correlation of the last five variables as a functionof the first 4

    gt setCor(y = 59x=14data=Thurstone)

    Call setCor(y = 59 x = 14 data = Thurstone)

    Multiple Regression from matrix input

    Beta weights

    FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

    Sentences 009 007 025 021 020

    Vocabulary 009 017 009 016 -002

    SentCompletion 002 005 004 021 008

    FirstLetters 058 045 021 008 031

    38

    Multiple R

    FourLetterWords Suffixes LetterSeries Pedigrees

    069 063 050 058

    LetterGroup

    048

    multiple R2

    FourLetterWords Suffixes LetterSeries Pedigrees

    048 040 025 034

    LetterGroup

    023

    Multiple Inflation Factor (VIF) = 1(1-SMC) =

    Sentences Vocabulary SentCompletion FirstLetters

    369 388 300 135

    Unweighted multiple R

    FourLetterWords Suffixes LetterSeries Pedigrees

    059 058 049 058

    LetterGroup

    045

    Unweighted multiple R2

    FourLetterWords Suffixes LetterSeries Pedigrees

    034 034 024 033

    LetterGroup

    020

    Various estimates of between set correlations

    Squared Canonical Correlations

    [1] 06280 01478 00076 00049

    Average squared canonical correlation = 02

    Cohens Set Correlation R2 = 069

    Unweighted correlation between the two sets = 073

    By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed

    gt sc lt- setCor(y = 59x=34data=Thurstonez=12)

    Call setCor(y = 59 x = 34 data = Thurstone z = 12)

    Multiple Regression from matrix input

    Beta weights

    FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

    SentCompletion 002 005 004 021 008

    FirstLetters 058 045 021 008 031

    Multiple R

    FourLetterWords Suffixes LetterSeries Pedigrees

    058 046 021 018

    LetterGroup

    030

    39

    multiple R2

    FourLetterWords Suffixes LetterSeries Pedigrees

    0331 0210 0043 0032

    LetterGroup

    0092

    Multiple Inflation Factor (VIF) = 1(1-SMC) =

    SentCompletion FirstLetters

    102 102

    Unweighted multiple R

    FourLetterWords Suffixes LetterSeries Pedigrees

    044 035 017 014

    LetterGroup

    026

    Unweighted multiple R2

    FourLetterWords Suffixes LetterSeries Pedigrees

    019 012 003 002

    LetterGroup

    007

    Various estimates of between set correlations

    Squared Canonical Correlations

    [1] 0405 0023

    Average squared canonical correlation = 021

    Cohens Set Correlation R2 = 042

    Unweighted correlation between the two sets = 048

    gt round(sc$residual2)

    FourLetterWords Suffixes LetterSeries Pedigrees

    FourLetterWords 052 011 009 006

    Suffixes 011 060 -001 001

    LetterSeries 009 -001 075 028

    Pedigrees 006 001 028 066

    LetterGroup 013 003 037 020

    LetterGroup

    FourLetterWords 013

    Suffixes 003

    LetterSeries 037

    Pedigrees 020

    LetterGroup 077

    52 Mediation and Moderation analysis

    Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping

    40

    Consider the example from Preacher and Hayes (2004) as analyzed using the mediate

    function and the subsequent graphic from mediatediagram The data are found in theexample for mediate

    Call mediate(y = SATIS x = THERAPY m = ATTRIB data = sobel)

    The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB

    Total Direct effect(c) of THERAPY on SATIS = 076 SE = 031 t direct = 25 with probability = 0019

    Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t direct = 135 with probability = 019

    Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033

    Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 004 Upper CI = 069

    R2 of model = 031

    To see the longer output specify short = FALSE in the print statement

    Full output

    Total effect estimates (c)

    SATIS se t Prob

    THERAPY 076 031 25 00186

    Direct effect estimates (c)SATIS se t Prob

    THERAPY 043 032 135 0190

    ATTRIB 040 018 223 0034

    a effect estimates

    THERAPY se t Prob

    ATTRIB 082 03 274 00106

    b effect estimates

    SATIS se t Prob

    ATTRIB 04 018 223 0034

    ab effect estimates

    SATIS boot sd lower upper

    THERAPY 033 032 017 004 069

    bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables

    setCor(y = c( SATV SATQ) x = c(education age ) data = satact std=TRUE)

    bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap

    mediate(y = c( SATV ) x = c(education age ) m= ACT data =satactstd=TRUEniter=50)

    bull mediate will take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It then tests the mediation effect using a bootstrap The particular example is for demonstration purposes only and shows neithermoderation nor mediation The number of iterations for the boot strap was set to 50

    41

    gt mediatediagram(preacher)

    Mediation model

    THERAPY SATIS

    ATTRIB

    082

    c = 076

    c = 043

    04

    Figure 16 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram

    42

    gt preacher lt- setCor(1c(23)sobelstd=FALSE)

    gt setCordiagram(preacher)

    Regression Models

    THERAPY

    ATTRIB

    SATIS

    043

    04

    021

    Figure 17 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure

    43

    for speed The default number of boot straps is 5000

    53 Set Correlation

    An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor

    function Set correlation is

    R2 = 1minusn

    prodi=1

    (1minusλi)

    where λi is the ith eigen value of the eigen value decomposition of the matrix

    R = Rminus1xx RxyRminus1

    xx Rminus1xy

    Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate

    setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data

    Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things tonotice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied

    For this example the analysis is done on the correlation matrix rather than the rawdata

    gt C lt- cov(satactuse=pairwise)

    gt model1 lt- lm(ACT~ gender + education + age data=satact)

    gt summary(model1)

    Call

    lm(formula = ACT ~ gender + education + age data = satact)

    Residuals

    44

    Call mediate(y = c(SATQ) x = c(ACT) m = education data = satact

    mod = gender niter = 50 std = TRUE)

    The DV (Y) was SATQ The IV (X) was ACT gender ACTXgndr The mediating variable(s) = education

    Total Direct effect(c) of ACT on SATQ = 058 SE = 003 t direct = 1925 with probability = 0

    Direct effect (c) of ACT on SATQ removing education = 059 SE = 003 t direct = 1926 with probability = 0

    Indirect effect (ab) of ACT on SATQ through education = -001

    Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -002 Upper CI = 0

    Total Direct effect(c) of gender on SATQ = -014 SE = 003 t direct = -478 with probability = 21e-06

    Direct effect (c) of gender on NA removing education = -014 SE = 003 t direct = -463 with probability = 44e-06

    Indirect effect (ab) of gender on SATQ through education = 0

    Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -001 Upper CI = 0

    Total Direct effect(c) of ACTXgndr on SATQ = 0 SE = 003 t direct = 002 with probability = 099

    Direct effect (c) of ACTXgndr on NA removing education = 0 SE = 003 t direct = 001 with probability = 099

    Indirect effect (ab) of ACTXgndr on SATQ through education = 0

    Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = 0 Upper CI = 0

    R2 of model = 037

    To see the longer output specify short = FALSE in the print statement

    Full output

    Total effect estimates (c)

    SATQ se t Prob

    ACT 058 003 1925 000e+00

    gender -014 003 -478 210e-06

    ACTXgndr 000 003 002 985e-01

    Direct effect estimates (c)SATQ se t Prob

    ACT 059 003 1926 000e+00

    gender -014 003 -463 437e-06

    ACTXgndr 000 003 001 992e-01

    a effect estimates

    education se t Prob

    ACT 016 004 422 277e-05

    gender 009 004 250 128e-02

    ACTXgndr -001 004 -015 883e-01

    b effect estimates

    SATQ se t Prob

    education -004 003 -145 0147

    ab effect estimates

    SATQ boot sd lower upper

    ACT -001 -001 001 0 0

    gender 000 000 000 0 0

    ACTXgndr 000 000 000 0 0

    Moderation model

    ACT

    gender

    ACTXgndr

    SATQ

    education016 c = 058

    c = 059

    009 c = minus014

    c = minus014

    minus001 c = 0

    c = 0

    minus004

    minus004

    minus007

    002

    Figure 18 Moderated multiple regression requires the raw data

    45

    Min 1Q Median 3Q Max

    -252458 -32133 07769 35921 92630

    Coefficients

    Estimate Std Error t value Pr(gt|t|)

    (Intercept) 2741706 082140 33378 lt 2e-16

    gender -048606 037984 -1280 020110

    education 047890 015235 3143 000174

    age 001623 002278 0712 047650

    ---

    Signif codes 0 0001 001 005 01 1

    Residual standard error 4768 on 696 degrees of freedom

    Multiple R-squared 00272 Adjusted R-squared 002301

    F-statistic 6487 on 3 and 696 DF p-value 00002476

    Compare this with the output from setCor

    gt compare with sector

    gt setCor(c(46)c(13)C nobs=700)

    Call setCor(y = c(46) x = c(13) data = C nobs = 700)

    Multiple Regression from matrix input

    Beta weights

    ACT SATV SATQ

    gender -005 -003 -018

    education 014 010 010

    age 003 -010 -009

    Multiple R

    ACT SATV SATQ

    016 010 019

    multiple R2

    ACT SATV SATQ

    00272 00096 00359

    Multiple Inflation Factor (VIF) = 1(1-SMC) =

    gender education age

    101 145 144

    Unweighted multiple R

    ACT SATV SATQ

    015 005 011

    Unweighted multiple R2

    ACT SATV SATQ

    002 000 001

    SE of Beta weights

    ACT SATV SATQ

    gender 018 429 434

    education 022 513 518

    age 022 511 516

    t of Beta Weights

    ACT SATV SATQ

    gender -027 -001 -004

    education 065 002 002

    46

    age 015 -002 -002

    Probability of t lt

    ACT SATV SATQ

    gender 079 099 097

    education 051 098 098

    age 088 098 099

    Shrunken R2

    ACT SATV SATQ

    00230 00054 00317

    Standard Error of R2

    ACT SATV SATQ

    00120 00073 00137

    F

    ACT SATV SATQ

    649 226 863

    Probability of F lt

    ACT SATV SATQ

    248e-04 808e-02 124e-05

    degrees of freedom of regression

    [1] 3 696

    Various estimates of between set correlations

    Squared Canonical Correlations

    [1] 0050 0033 0008

    Chisq of canonical correlations

    [1] 358 231 56

    Average squared canonical correlation = 003

    Cohens Set Correlation R2 = 009

    Shrunken Set Correlation R2 = 008

    F and df of Cohens Set Correlation 726 9 168186

    Unweighted correlation between the two sets = 001

    Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

    6 Converting output to APA style tables using LATEX

    Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenient

    47

    LATEXoutput and finally df2latex converts a generic data frame to LATEX

    An example of converting the output from fa to LATEXappears in Table 2

    Table 2 fa2latexA factor analysis table from the psych package in R

    Variable MR1 MR2 MR3 h2 u2 com

    Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

    SS loadings 264 186 15

    MR1 100 059 054MR2 059 100 052MR3 054 052 100

    48

    7 Miscellaneous functions

    A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

    blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

    df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

    scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

    cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

    cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

    dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

    fisherz Convert a correlation to the corresponding Fisher z score

    geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

    ICC and cohenkappa are typically used to find the reliability for raters

    headtail combines the head and tail functions to show the first and last lines of a dataset or output

    topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

    mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

    prep finds the probability of replication for an F t or r and estimate effect size

    partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

    rangeCorrection will correct correlations for restriction of range

    reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

    49

    superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

    8 Data sets

    A number of data sets for demonstrating psychometric techniques are included in thepsych package These include six data sets showing a hierarchical factor structure (fivecognitive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2and one from health psychology Reise) One of these (Thurstone) is used as an examplein the sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) or 13 personality inventory scores (epibfi) and14 multiple choice iq items (iqitems) The vegetables example has paired comparisonpreferences for 9 vegetables This is an example of Thurstonian scaling used by Guilford(1954) and Nunnally (1967) Other data sets include cubits peas and heights fromGalton

    Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

    bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

    satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

    epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

    50

    iq 14 multiple choice ability items were included as part of the Synthetic Aperture Person-ality Assessment (SAPA) web based personality assessment project The data from1000 subjects are included here as a demonstration set for scoring multiple choiceinventories and doing basic item statistics

    galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to ntroduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

    Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

    miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

    9 Development version and a users guide

    The most recent development version is available as a source file at the repository main-tained at httppersonality-projectorgr That version will have removed the mostrecently discovered bugs (but perhaps introduced other yet to be discovered ones) Todownload that version go to the repository httppersonality-projectorgrsrc

    contrib and wander around For a Mac this version can be installed directly using theldquoother repositoryrdquo option in the package installer For a PC the zip file for the most recentrelease has been created using the win-builder facility at CRAN The development releasefor the Mac is usually several weeks ahead of the PC development version

    Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httppersonality-projectorgrpsych_manualpdf

    News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

    gt news(Version gt 170package=psych)

    51

    10 Psychometric Theory

    The psych package has been developed to help psychologists do basic research Many ofthe functions were developed to supplement a book (httppersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep)More information about the use of some of the functions may be found in the book

    For more extensive discussion of the use of psych in particular and R in general consulthttppersonality-projectorgrrguidehtml A short guide to R

    11 SessionInfo

    This document was prepared using the following settings

    gt sessionInfo()

    R Under development (unstable) (2017-03-05 r72309)

    Platform x86_64-apple-darwin1340 (64-bit)

    Running under macOS Sierra 10124

    Matrix products default

    BLAS LibraryFrameworksRframeworkVersions34ResourcesliblibRblas0dylib

    LAPACK LibraryFrameworksRframeworkVersions34ResourcesliblibRlapackdylib

    locale

    [1] C

    attached base packages

    [1] stats graphics grDevices utils datasets methods base

    other attached packages

    [1] psych_17421

    loaded via a namespace (and not attached)

    [1] compiler_340 parallel_340 tools_340 foreign_08-67

    [5] KernSmooth_223-15 nlme_31-131 mnormt_15-4 grid_340

    [9] lattice_020-34

    52

    References

    Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

    Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

    Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

    Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

    Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

    Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

    Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

    Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

    Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

    Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

    Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

    Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

    Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

    Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

    Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

    53

    Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

    Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

    Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

    Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

    Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

    Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

    Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

    Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

    Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

    Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

    MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

    Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

    McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

    Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

    Nunnally J C (1967) Psychometric theory McGraw-Hill New York

    54

    Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

    3rd edition

    Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

    Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

    Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

    Revelle W (2015) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 158

    Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

    Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

    Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

    Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

    Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook ofIndividual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

    Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

    Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

    Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

    Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bang

    55

    for the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

    Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

    Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

    Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

    Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

    Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

    Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

    Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

    Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

    Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

    Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

    Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

    56

    Index

    affect 14 24alpha 5 6

    Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26bifactor 6biserial 13 34blockrandom 49burt 34

    char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49circular statistics 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49ctv 7cubits 50

    densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49diagram 8drawcor 34drawtetra 34dummycode 13

    dynamite plot 19

    edit 3epibfi 50error bars 19errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23

    fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factor analysis 6factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49

    galton 51generalized least squares 6geometricmean 49GPArotation 7guttman 6

    harmonicmean 49head 49headtail 49heights 50hetdiagram 7Hmisc 28Holzinger 50

    57

    ICC 6 49iclust 6iclustdiagram 7Index 49introduction to psychometric theory with ap-

    plications in R 7iqitems 50irtfa 6 47irt2latex 47 49

    KnitR 47

    lavaan 38library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27lowess 14

    makekeys 14MAP 6mardia 49maximum likelihood 6mediate 4 41 42mediatediagram 41minimum residual 6mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevel 37multilevelreliability 6multiple regression 38

    nfactors 6nlme 37

    omega 6 7outlier 3 11 12

    padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7principal axis 6psych 3 5ndash8 28 47 49ndash52

    R functionaffect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50

    58

    densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13edit 3epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49head 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47

    irt2latex 47 49library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7psych 51psych package

    affect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26

    59

    biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49corsmooth 34cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7

    fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47irt2latex 47 49lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12prep 49pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34

    60

    polyserial 34principal 5ndash7psych 51rtest 28rangeCorrection 49readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

    rtest 28

    rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50table 19tail 49tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

    R package

    61

    ctv 7GPArotation 7Hmisc 28KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash8 28 47 49ndash52Rgraphviz 8sem 7 50stats 28Sweave 47xtable 47

    rtest 28rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49Rgraphviz 8

    SAPA 26 50 51satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43sem 7 50set correlation 44setcor 49setCor 4 38 41 44 46 47simmultilevel 37

    spider 13stars 14stats 28StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50Sweave 47

    table 19tail 49tetrachoric 6 34Thurstone 28 38 50Thurstone33 50topBottom 49

    vegetables 50 51violinBy 14 18vss 5 6

    weighted least squares 6withinBetween 37

    xtable 47

    62

    • Jump starting the psych packagendasha guide for the impatient
    • Psychometric functions are summarized in the second vignette
    • Overview of this and related documents
    • Getting started
    • Basic data analysis
      • Getting the data by using readfile
      • Data input from the clipboard
      • Basic descriptive statistics
        • Outlier detection using outlier
        • Basic data cleaning using scrub
        • Recoding categorical variables into dummy coded variables
          • Simple descriptive graphics
            • Scatter Plot Matrices
            • Density or violin plots
            • Means and error bars
            • Error bars for tabular data
            • Two dimensional displays of means and errors
            • Back to back histograms
            • Correlational structure
            • Heatmap displays of correlational structure
              • Testing correlations
              • Polychoric tetrachoric polyserial and biserial correlations
                • Multilevel modeling
                  • Decomposing data into within and between level correlations using statsBy
                  • Generating and displaying multilevel data
                  • Factor analysis by groups
                    • Multiple Regression mediation moderation and set correlations
                      • Multiple regression from data or correlation matrices
                      • Mediation and Moderation analysis
                      • Set Correlation
                        • Converting output to APA style tables using LaTeX
                        • Miscellaneous functions
                        • Data sets
                        • Development version and a users guide
                        • Psychometric Theory
                        • SessionInfo

      01 Jump starting the psych packagendasha guide for the impatient

      You have installed psych (section 2) and you want to use it without reading much moreWhat should you do

      1 Activate the psych package

      library(psych)

      2 Input your data (section 31) There are two ways to do this

      bull Find and read standard files using readfile This will open a search windowfor your operating system which you can use to find the file If the file has asuffix of text txt csv data sav r R rds Rds rda Rda rdata orRData then the file will be opened and the data will be read in

      myData lt- readfile() find the appropriate file using your normal operating system

      bull Alternatively go to your friendly text editor or data manipulation program(eg Excel) and copy the data to the clipboard Include a first line that has thevariable labels Paste it into psych using the readclipboardtab command

      myData lt- readclipboardtab() if on the clipboard

      Note that there are number of options for readclipboard for reading in Excelbased files lower triangular files etc

      3 Make sure that what you just read is right Describe it (section 33) and perhapslook at the first and last few lines If you have multiple groups try describeBy

      dim(myData) What are the dimensions of the data

      describe(myData) or

      descrbeBy(myDatagroups=mygroups) for descriptive statistics by groups

      headTail(myData) show the first and last n lines of a file

      4 Look at the patterns in the data If you have fewer than about 12 variables lookat the SPLOM (Scatter Plot Matrix) of the data using pairspanels (section 341)Then use the outlier function to detect outliers

      pairspanels(myData)

      outlier(myData)

      5 Note that you might have some weird subjects probably due to data entry errorsEither edit the data by hand (use the edit command) or just scrub the data (section332)

      cleaned lt- scrub(myData max=9) eg change anything great than 9 to NA

      6 Graph the data with error bars for each variable (section 343)

      errorbars(myData)

      3

      7 Find the correlations of all of your data lowerCor will by default find the pairwisecorrelations round them to 2 decimals and display the lower off diagonal matrix

      bull Descriptively (just the values) (section 347)

      r lt- lowerCor(myData) The correlation matrix rounded to 2 decimals

      bull Graphically (section 348) Another way is to show a heat map of the correla-tions with the correlation values included

      corPlot(r) examine the many options for this function

      bull Inferentially (the values the ns and the p values) (section 35)

      corrtest(myData)

      8 Apply various regression models

      Several functions are meant to do multiple regressions either from the raw data orfrom a variancecovariance matrix or a correlation matrix

      bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables

      myData lt- satact

      colnames(myData) lt- c(mod1med1x1x2y1y2)

      setCor(y = c( y1 y2) x = c(x1x2) data = myData)

      bull mediate will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables mediatedthrough a mediation variable It then tests the mediation effect using a bootstrap

      mediate(y = c( y1 y2) x = c(x1x2) m= med1 data = myData)

      bull mediate will take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple xvariables mediated through a mediation variable It then tests the mediationeffect using a boot strap

      mediate(y = c( y1 y2) x = c(x1x2) m= med1 mod = mod1 data = myData)

      02 Psychometric functions are summarized in the second vignette

      Many additional functions particularly designed for basic and advanced psychomet-rics are discussed more fully in the Overview Vignette A brief review of the functionsavailable is included here In addition there are helpful tutorials for Finding omegaHow to score scales and find reliability and for Using psych for factor analysis athttppersonality-projectorgr

      4

      bull Test for the number of factors in your data using parallel analysis (faparallelsection ) or Very Simple Structure (vss )

      faparallel(myData)

      vss(myData)

      bull Factor analyze (see section ) the data with a specified number of factors(the default is 1) the default method is minimum residual the default rotationfor more than one factor is oblimin There are many more possibilities (seesections -) Compare the solution to a hierarchical cluster analysis using theICLUST algorithm (Revelle 1979) (see section ) Also consider a hierarchicalfactor solution to find coefficient ω (see )

      fa(myData)

      iclust(myData)

      omega(myData)

      If you prefer to do a principal components analysis you may use the principalfunction The default is one component

      principal(myData)

      bull Some people like to find coefficient α as an estimate of reliability This may bedone for a single scale using the alpha function (see ) Perhaps more usefulis the ability to create several scales as unweighted averages of specified itemsusing the scoreItems function (see ) and to find various estimates of internalconsistency for these scales find their intercorrelations and find scores for allthe subjects

      alpha(myData) score all of the items as part of one scale

      myKeys lt- makekeys(nvar=20list(first = c(1-35-7810)second=c(24-61115-16)))

      myscores lt- scoreItems(myKeysmyData) form several scales

      myscores show the highlights of the results

      At this point you have had a chance to see the highlights of the psych package and to dosome basic (and advanced) data analysis You might find reading this entire vignette aswell as the Overview Vignette to be helpful to get a broader understanding of what can bedone in R using the psych Remember that the help command () is available for everyfunction Try running the examples for each help page

      5

      1 Overview of this and related documents

      The psych package (Revelle 2015) has been developed at Northwestern University since2005 to include functions most useful for personality psychometric and psychological re-search The package is also meant to supplement a text on psychometric theory (Revelleprep) a draft of which is available at httppersonality-projectorgrbook

      Some of the functions (eg readfile readclipboard describe pairspanels scat-terhist errorbars multihist bibars) are useful for basic data entry and descrip-tive analyses

      Psychometric applications emphasize techniques for dimension reduction including factoranalysis cluster analysis and principal components analysis The fa function includesfive methods of factor analysis (minimum residual principal axis weighted least squaresgeneralized least squares and maximum likelihood factor analysis) Principal ComponentsAnalysis (PCA) is also available through the use of the principal or pca functions De-termining the number of factors or components to extract may be done by using the VerySimple Structure (Revelle and Rocklin 1979) (vss) Minimum Average Partial correlation(Velicer 1976) (MAP) or parallel analysis (faparallel) criteria These and several othercriteria are included in the nfactors function Two parameter Item Response Theory(IRT) models for dichotomous or polytomous items may be found by factoring tetra-

      choric or polychoric correlation matrices and expressing the resulting parameters interms of location and discrimination using irtfa

      Bifactor and hierarchical factor structures may be estimated by using Schmid Leimantransformations (Schmid and Leiman 1957) (schmid) to transform a hierarchical factorstructure into a bifactor solution (Holzinger and Swineford 1937) Higher order modelscan also be found using famulti

      Scale construction can be done using the Item Cluster Analysis (Revelle 1979) (iclust)function to determine the structure and to calculate reliability coefficients α (Cronbach1951)(alpha scoreItems scoremultiplechoice) β (Revelle 1979 Revelle and Zin-barg 2009) (iclust) and McDonaldrsquos ωh and ωt (McDonald 1999) (omega) Guttmanrsquos sixestimates of internal consistency reliability (Guttman (1945) as well as additional estimates(Revelle and Zinbarg 2009) are in the guttman function The six measures of Intraclasscorrelation coefficients (ICC) discussed by Shrout and Fleiss (1979) are also available

      For data with a a multilevel structure (eg items within subjects across time or itemswithin subjects across groups) the describeBy statsBy functions will give basic descrip-tives by group StatsBy also will find within group (or subject) correlations as well as thebetween group correlation

      multilevelreliability mlr will find various generalizability statistics for subjects over

      6

      time and items mlPlot will graph items over for each subject mlArrange converts widedata frames to long data frames suitable for multilevel modeling

      Graphical displays include Scatter Plot Matrix (SPLOM) plots using pairspanels cor-relation ldquoheat mapsrdquo (corPlot) factor cluster and structural diagrams using fadiagramiclustdiagram structurediagram and hetdiagram as well as item response charac-teristics and item and test information characteristic curves plotirt and plotpoly

      This vignette is meant to give an overview of the psych package That is it is meantto give a summary of the main functions in the psych package with examples of howthey are used for data description dimension reduction and scale construction The ex-tended user manual at psych_manualpdf includes examples of graphic output and moreextensive demonstrations than are found in the help menus (Also available at http

      personality-projectorgrpsych_manualpdf) The vignette psych for sem atpsych_for_sempdf discusses how to use psych as a front end to the sem package of JohnFox (Fox et al 2012) (The vignette is also available at httppersonality-project

      orgrbookpsych_for_sempdf)

      For a step by step tutorial in the use of the psych package and the base functions inR for basic personality research see the guide for using R for personality research athttppersonalitytheoryorgrrshorthtml For an introduction to psychometrictheory with applications in R see the draft chapters at httppersonality-project

      orgrbook)

      2 Getting started

      Some of the functions described in the Overview Vignette require other packages This isnot the case for the functions listed in this Introduction Particularly useful for rotatingthe results of factor analyses (from eg fa factorminres factorpa factorwlsor principal) or hierarchical factor models using omega or schmid is the GPArotationpackage These and other useful packages may be installed by first installing and thenusing the task views (ctv) package to install the ldquoPsychometricsrdquo task view but doing itthis way is not necessary

      installpackages(ctv)

      library(ctv)

      taskviews(Psychometrics)

      The ldquoPsychometricsrdquo task view will install a large number of useful packages To installthe bare minimum for the examples in this vignette it is necessary to install just 3 pack-ages

      7

      installpackages(list(c(GPArotationmnormt)

      Because of the difficulty of installing the package Rgraphviz alternative graphics have beendeveloped and are available as diagram functions If Rgraphviz is available some functionswill take advantage of it An alternative is to useldquodotrdquooutput of commands for any externalgraphics package that uses the dot language

      3 Basic data analysis

      A number of psych functions facilitate the entry of data and finding basic descriptivestatistics

      Remember to run any of the psych functions it is necessary to make the package activeby using the library command

      library(psych)

      The other packages once installed will be called automatically by psych

      It is possible to automatically load psych and other functions by creating and then savinga ldquoFirstrdquo function eg

      First lt- function(x) library(psych)

      31 Getting the data by using readfile

      Although many find copying the data to the clipboard and then using the readclipboardfunctions (see below) a helpful alternative is to read the data in directly This can be doneusing the readfile function which calls filechoose to find the file and then based uponthe suffix of the file chooses the appropriate way to read it For files with suffixes of txttext r rds rda csv xpt or sav the file will be read correctly

      mydata lt- readfile()

      If the file contains Fixed Width Format (fwf) data the column information can be specifiedwith the widths command

      mydata lt- readfile(widths = c(4rep(135)) will read in a file without a header row and 36 fields the first of which is 4 colums the rest of which are 1 column each

      If the file is a RData file (with suffix of RData Rda rda Rdata or rdata) the objectwill be loaded Depending what was stored this might be several objects If the file is asav file from SPSS it will be read with the most useful default options (converting the fileto a dataframe and converting character fields to numeric) Alternative options may bespecified If it is an export file from SAS (xpt or XPT) it will be read csv files (comma

      8

      separated files) normal txt or text files data or dat files will be read as well These areassumed to have a header row of variable labels (header=TRUE) If the data do not havea header row you must specify readfile(header=FALSE)

      To read SPSS files and to keep the value labels specify usevaluelabels=TRUE

      myspss lt- readfile(usevaluelabels=TRUE) this will keep the value labels for sav files

      32 Data input from the clipboard

      There are of course many ways to enter data into R Reading from a local file usingreadtable is perhaps the most preferred However many users will enter their datain a text editor or spreadsheet program and then want to copy and paste into R Thismay be done by using readtable and specifying the input file as ldquoclipboardrdquo (PCs) orldquopipe(pbpaste)rdquo (Macs) Alternatively the readclipboard set of functions are perhapsmore user friendly

      readclipboard is the base function for reading data from the clipboard

      readclipboardcsv for reading text that is comma delimited

      readclipboardtab for reading text that is tab delimited (eg copied directly from anExcel file)

      readclipboardlower for reading input of a lower triangular matrix with or without adiagonal The resulting object is a square matrix

      readclipboardupper for reading input of an upper triangular matrix

      readclipboardfwf for reading in fixed width fields (some very old data sets)

      For example given a data set copied to the clipboard from a spreadsheet just enter thecommand

      mydata lt- readclipboard()

      This will work if every data field has a value and even missing data are given some values(eg NA or -999) If the data were entered in a spreadsheet and the missing valueswere just empty cells then the data should be read in as a tab delimited or by using thereadclipboardtab function

      gt mydata lt- readclipboard(sep=t) define the tab option or

      gt mytabdata lt- readclipboardtab() just use the alternative function

      For the case of data in fixed width fields (some old data sets tend to have this format)copy to the clipboard and then specify the width of each field (in the example below the

      9

      first variable is 5 columns the second is 2 columns the next 5 are 1 column the last 4 are3 columns)

      gt mydata lt- readclipboardfwf(widths=c(52rep(15)rep(34))

      33 Basic descriptive statistics

      Once the data are read in then describe or describeBy will provide basic descriptivestatistics arranged in a data frame format Consider the data set satact which in-cludes data from 700 web based participants on 3 demographic variables and 3 abilitymeasures

      describe reports means standard deviations medians min max range skew kurtosisand standard errors for integer or real data Non-numeric data although the statisticsare meaningless will be treated as if numeric (based upon the categorical coding ofthe data) and will be flagged with an

      describeBy reports descriptive statistics broken down by some categorizing variable (eggender age etc)

      gt library(psych)

      gt data(satact)

      gt describe(satact) basic descriptive statistics

      vars n mean sd median trimmed mad min max range skew

      gender 1 700 165 048 2 168 000 1 2 1 -061

      education 2 700 316 143 3 331 148 0 5 5 -068

      age 3 700 2559 950 22 2386 593 13 65 52 164

      ACT 4 700 2855 482 29 2884 445 3 36 33 -066

      SATV 5 700 61223 11290 620 61945 11861 200 800 600 -064

      SATQ 6 687 61022 11564 620 61725 11861 200 800 600 -059

      kurtosis se

      gender -162 002

      education -007 005

      age 242 036

      ACT 053 018

      SATV 033 427

      SATQ -002 441

      These data may then be analyzed by groups defined in a logical statement or by some othervariable Eg break down the descriptive data for males or females These descriptivedata can also be seen graphically using the errorbarsby function (Figure 6) By settingskew=FALSE and ranges=FALSE the output is limited to the most basic statistics

      gt basic descriptive statistics by a grouping variable

      gt describeBy(satactsatact$genderskew=FALSEranges=FALSE)

      Descriptive statistics by group

      group 1

      vars n mean sd se

      gender 1 247 100 000 000

      10

      education 2 247 300 154 010

      age 3 247 2586 974 062

      ACT 4 247 2879 506 032

      SATV 5 247 61511 11416 726

      SATQ 6 245 63587 11602 741

      ------------------------------------------------------------

      group 2

      vars n mean sd se

      gender 1 453 200 000 000

      education 2 453 326 135 006

      age 3 453 2545 937 044

      ACT 4 453 2842 469 022

      SATV 5 453 61066 11231 528

      SATQ 6 442 59600 11307 538

      The output from the describeBy function can be forced into a matrix form for easy analysisby other programs In addition describeBy can group by several grouping variables at thesame time

      gt samat lt- describeBy(satactlist(satact$gendersatact$education)

      + skew=FALSEranges=FALSEmat=TRUE)

      gt headTail(samat)

      item group1 group2 vars n mean sd se

      gender1 1 1 0 1 27 1 0 0

      gender2 2 2 0 1 30 2 0 0

      gender3 3 1 1 1 20 1 0 0

      gender4 4 2 1 1 25 2 0 0

      ltNAgt ltNAgt ltNAgt

      SATQ9 69 1 4 6 51 6359 10412 1458

      SATQ10 70 2 4 6 86 59759 10624 1146

      SATQ11 71 1 5 6 46 65783 8961 1321

      SATQ12 72 2 5 6 93 60672 10555 1095

      331 Outlier detection using outlier

      One way to detect unusual data is to consider how far each data point is from the mul-tivariate centroid of the data That is find the squared Mahalanobis distance for eachdata point and then compare these to the expected values of χ2 This produces a Q-Q(quantle-quantile) plot with the n most extreme data points labeled (Figure 1) The outliervalues are in the vector d2

      332 Basic data cleaning using scrub

      If after describing the data it is apparent that there were data entry errors that need tobe globally replaced with NA or only certain ranges of data will be analyzed the data canbe ldquocleanedrdquo using the scrub function

      Consider a data set of 10 rows of 12 columns with values from 1 - 120 All values of columns

      11

      gt png( outlierpng )

      gt d2 lt- outlier(satactcex=8)

      gt devoff()

      null device

      1

      Figure 1 Using the outlier function to graphically show outliers The y axis is theMahalanobis D2 the X axis is the distribution of χ2 for the same number of degrees offreedom The outliers detected here may be shown graphically using pairspanels (see2 and may be found by sorting d2

      12

      3 - 5 that are less than 30 40 or 50 respectively or greater than 70 in any of the threecolumns will be replaced with NA In addition any value exactly equal to 45 will be setto NA (max and isvalue are set to one value here but they could be a different value forevery column)

      gt x lt- matrix(1120ncol=10byrow=TRUE)

      gt colnames(x) lt- paste(V110sep=)gt newx lt- scrub(x35min=c(304050)max=70isvalue=45newvalue=NA)

      gt newx

      V1 V2 V3 V4 V5 V6 V7 V8 V9 V10

      [1] 1 2 NA NA NA 6 7 8 9 10

      [2] 11 12 NA NA NA 16 17 18 19 20

      [3] 21 22 NA NA NA 26 27 28 29 30

      [4] 31 32 33 NA NA 36 37 38 39 40

      [5] 41 42 43 44 NA 46 47 48 49 50

      [6] 51 52 53 54 55 56 57 58 59 60

      [7] 61 62 63 64 65 66 67 68 69 70

      [8] 71 72 NA NA NA 76 77 78 79 80

      [9] 81 82 NA NA NA 86 87 88 89 90

      [10] 91 92 NA NA NA 96 97 98 99 100

      [11] 101 102 NA NA NA 106 107 108 109 110

      [12] 111 112 NA NA NA 116 117 118 119 120

      Note that the number of subjects for those columns has decreased and the minimums havegone up but the maximums down Data cleaning and examination for outliers should be aroutine part of any data analysis

      333 Recoding categorical variables into dummy coded variables

      Sometimes categorical variables (eg college major occupation ethnicity) are to be ana-lyzed using correlation or regression To do this one can form ldquodummy codesrdquo which aremerely binary variables for each category This may be done using dummycode Subse-quent analyses using these dummy coded variables may be using biserial or point biserial(regular Pearson r) to show effect sizes and may be plotted in eg spider plots

      Alternatively sometimes data were coded originally as categorical (MaleFemale HighSchool some College in college etc) and you want to convert these columns of data tonumeric This is done by char2numeric

      34 Simple descriptive graphics

      Graphic descriptions of data are very helpful both for understanding the data as well ascommunicating important results Scatter Plot Matrices (SPLOMS) using the pairspanelsfunction are useful ways to look for strange effects involving outliers and non-linearitieserrorbarsby will show group means with 95 confidence boundaries By default er-rorbarsby and errorbars will show ldquocats eyesrdquo to graphically show the confidence

      13

      limits (Figure 6) This may be turned off by specifying eyes=FALSE densityBy or vio-

      linBy may be used to show the distribution of the data in ldquoviolinrdquo plots (Figure 5) (Theseare sometimes called ldquolava-lamprdquo plots)

      341 Scatter Plot Matrices

      Scatter Plot Matrices (SPLOMS) are very useful for describing the data The pairspanelsfunction adapted from the help menu for the pairs function produces xy scatter plots ofeach pair of variables below the diagonal shows the histogram of each variable on thediagonal and shows the lowess locally fit regression line as well An ellipse around themean with the axis length reflecting one standard deviation of the x and y variables is alsodrawn The x axis in each scatter plot represents the column variable the y axis the rowvariable (Figure 2) When plotting many subjects it is both faster and cleaner to set theplot character (pch) to be rsquorsquo (See Figure 2 for an example)

      pairspanels will show the pairwise scatter plots of all the variables as well as his-tograms locally smoothed regressions and the Pearson correlation When plottingmany data points (as in the case of the satact data it is possible to specify that theplot character is a period to get a somewhat cleaner graphic However in this figureto show the outliers we use colors and a larger plot character If we want to indicatersquosignificancersquo of the correlations by the conventional use of rsquomagic astricksrsquo we can setthe stars=TRUE option

      Another example of pairspanels is to show differences between experimental groupsConsider the data in the affect data set The scores reflect post test scores on positiveand negative affect and energetic and tense arousal The colors show the results for fourmovie conditions depressing frightening movie neutral and a comedy

      Yet another demonstration of pairspanels is useful when you have many subjects andwant to show the density of the distributions To do this we will use the makekeys

      and scoreItems functions (discussed in the second vignette) to create scales measuringEnergetic Arousal Tense Arousal Positive Affect and Negative Affect (see the msq helpfile) We then show a pairspanels scatter plot matrix where we smooth the data pointsand show the density of the distribution by color

      342 Density or violin plots

      Graphical presentation of data may be shown using box plots to show the median and 25thand 75th percentiles A powerful alternative is to show the density distribution using theviolinBy function (Figure 5)

      14

      gt png( pairspanelspng )

      gt satd2 lt- dataframe(satactd2) combine the d2 statistics from before with the satact dataframe

      gt pairspanels(satd2bg=c(yellowblue)[(d2 gt 25)+1]pch=21stars=TRUE)

      gt devoff()

      null device

      1

      Figure 2 Using the pairspanels function to graphically show relationships The x axisin each scatter plot represents the column variable the y axis the row variable Note theextreme outlier for the ACT If the plot character were set to a period (pch=rsquorsquo) it wouldmake a cleaner graphic but in to show the outliers in color we use the plot characters 21and 22

      15

      gt png(affectpng)gt pairspanels(affect[1417]bg=c(redblackwhiteblue)[affect$Film]pch=21

      + main=Affect varies by movies )

      gt devoff()

      null device

      1

      Figure 3 Using the pairspanels function to graphically show relationships The x axis ineach scatter plot represents the column variable the y axis the row variable The coloringrepresent four different movie conditions

      16

      gt keys lt- makekeys(msq[175]list(

      + EA = c(active energetic vigorous wakeful wideawake fullofpep

      + lively -sleepy -tired -drowsy)

      + TA =c(intense jittery fearful tense clutchedup -quiet -still

      + -placid -calm -atrest)

      + PA =c(active excited strong inspired determined attentive

      + interested enthusiastic proud alert)

      + NAf =c(jittery nervous scared afraid guilty ashamed distressed

      + upset hostile irritable )) )

      gt scores lt- scoreItems(keysmsq[175])

      gt png(msqpng)gt pairspanels(scores$scoressmoother=TRUE

      + main =Density distributions of four measures of affect )

      gt devoff()

      null device

      1

      Figure 4 Using the pairspanels function to graphically show relationships The x axis ineach scatter plot represents the column variable the y axis the row variable The variablesare four measures of motivational state for 3896 participants Each scale is the averagescore of 10 items measuring motivational state Compare this a plot with smoother set toFALSE

      17

      gt data(satact)

      gt violinBy(satact[56]satact$gendergrpname=c(M F)main=Density Plot by gender for SAT V and Q)

      Density Plot by gender for SAT V and Q

      Obs

      erve

      d

      SATV M SATV F SATQ M SATQ F

      200

      300

      400

      500

      600

      700

      800

      Figure 5 Using the violinBy function to show the distribution of SAT V and Q for malesand females The plot shows the medians and 25th and 75th percentiles as well as theentire range and the density distribution

      18

      343 Means and error bars

      Additional descriptive graphics include the ability to draw error bars on sets of data aswell as to draw error bars in both the x and y directions for paired data These are thefunctions errorbars errorbarsby errorbarstab and errorcrosses

      errorbars show the 95 confidence intervals for each variable in a data frame or ma-trix These errors are based upon normal theory and the standard errors of the meanAlternative options include +- one standard deviation or 1 standard error If thedata are repeated measures the error bars will be reflect the between variable cor-relations By default the confidence intervals are displayed using a ldquocats eyesrdquo plotwhich emphasizes the distribution of confidence within the confidence interval

      errorbarsby does the same but grouping the data by some condition

      errorbarstab draws bar graphs from tabular data with error bars based upon thestandard error of proportion (σp =

      radicpqN)

      errorcrosses draw the confidence intervals for an x set and a y set of the same size

      The use of the errorbarsby function allows for graphic comparisons of different groups(see Figure 6) Five personality measures are shown as a function of high versus low scoreson a ldquolierdquo scale People with higher lie scores tend to report being more agreeable consci-entious and less neurotic than people with lower lie scores The error bars are based uponnormal theory and thus are symmetric rather than reflect any skewing in the data

      Although not recommended it is possible to use the errorbars function to draw bargraphs with associated error bars (This kind of dynamite plot (Figure 8) can be verymisleading in that the scale is arbitrary Go to a discussion of the problems in presentingdata this way at httpemdbolkerwikidotcomblogdynamite In the example shownnote that the graph starts at 0 although is out of the range This is a function of usingbars which always are assumed to start at zero Consider other ways of showing yourdata

      344 Error bars for tabular data

      However it is sometimes useful to show error bars for tabular data either found by thetable function or just directly input These may be found using the errorbarstab

      function

      19

      gt data(epibfi)

      gt errorbarsby(epibfi[610]epibfi$epilielt4)

      095 confidence limits

      Independent Variable

      Dep

      ende

      nt V

      aria

      ble

      bfagree bfcon bfext bfneur bfopen

      050

      100

      150

      Figure 6 Using the errorbarsby function shows that self reported personality scales onthe Big Five Inventory vary as a function of the Lie scale on the EPI The ldquocats eyesrdquo showthe distribution of the confidence

      20

      gt errorbarsby(satact[56]satact$genderbars=TRUE

      + labels=c(MaleFemale)ylab=SAT scorexlab=)

      Male Female

      095 confidence limits

      SAT

      sco

      re

      200

      300

      400

      500

      600

      700

      800

      200

      300

      400

      500

      600

      700

      800

      Figure 7 A ldquoDynamite plotrdquo of SAT scores as a function of gender is one way of misleadingthe reader By using a bar graph the range of scores is ignored Bar graphs start from 0

      21

      gt T lt- with(satacttable(gendereducation))

      gt rownames(T) lt- c(MF)

      gt errorbarstab(Tway=bothylab=Proportion of Education Levelxlab=Level of Education

      + main=Proportion of sample by education level)

      Proportion of sample by education level

      Level of Education

      Pro

      port

      ion

      of E

      duca

      tion

      Leve

      l

      000

      005

      010

      015

      020

      025

      030

      M 0 M 1 M 2 M 3 M 4 M 5

      000

      005

      010

      015

      020

      025

      030

      Figure 8 The proportion of each education level that is Male or Female By using theway=rdquobothrdquo option the percentages and errors are based upon the grand total Alterna-tively way=rdquocolumnsrdquo finds column wise percentages way=rdquorowsrdquo finds rowwise percent-ages The data can be converted to percentages (as shown) or by total count (raw=TRUE)The function invisibly returns the probabilities and standard errors See the help menu foran example of entering the data as a dataframe

      22

      345 Two dimensional displays of means and errors

      Yet another way to display data for different conditions is to use the errorCrosses func-tion For instance the effect of various movies on both ldquoEnergetic Arousalrdquo and ldquoTenseArousalrdquo can be seen in one graph and compared to the same movie manipulations onldquoPositive Affectrdquo and ldquoNegative Affectrdquo Note how Energetic Arousal is increased by threeof the movie manipulations but that Positive Affect increases following the Happy movieonly

      23

      gt op lt- par(mfrow=c(12))

      gt data(affect)

      gt colors lt- c(blackredwhiteblue)

      gt films lt- c(SadHorrorNeutralHappy)

      gt affectstats lt- errorCircles(EA2TA2data=affect[-c(120)]group=Filmlabels=films

      + xlab=Energetic Arousal ylab=Tense Arousalylim=c(1022)xlim=c(820)pch=16

      + cex=2colors=colors main = Movies effect on arousal)gt errorCircles(PA2NA2data=affectstatslabels=filmsxlab=Positive Affect

      + ylab=Negative Affect pch=16cex=2colors=colors main =Movies effect on affect)

      gt op lt- par(mfrow=c(11))

      8 12 16 20

      1012

      1416

      1820

      22

      Movies effect on arousal

      Energetic Arousal

      Tens

      e A

      rous

      al

      SadHorror

      NeutralHappy

      6 8 10 12

      24

      68

      10

      Movies effect on affect

      Positive Affect

      Neg

      ativ

      e A

      ffect

      Sad

      Horror

      NeutralHappy

      Figure 9 The use of the errorCircles function allows for two dimensional displays ofmeans and error bars The first call to errorCircles finds descriptive statistics for theaffect dataframe based upon the grouping variable of Film These data are returned andthen used by the second call which examines the effect of the same grouping variable upondifferent measures The size of the circles represent the relative sample sizes for each groupThe data are from the PMC lab and reported in Smillie et al (2012)

      24

      346 Back to back histograms

      The bibars function summarize the characteristics of two groups (eg males and females)on a second variable (eg age) by drawing back to back histograms (see Figure 10)

      25

      data(bfi)gt png( bibarspng )

      gt with(bfibibars(agegenderylab=Agemain=Age by males and females))

      gt devoff()

      null device

      1

      Figure 10 A bar plot of the age distribution for males and females shows the use ofbibars The data are males and females from 2800 cases collected using the SAPAprocedure and are available as part of the bfi data set

      26

      347 Correlational structure

      There are many ways to display correlations Tabular displays are probably the mostcommon The output from the cor function in core R is a rectangular matrix lowerMat

      will round this to (2) digits and then display as a lower off diagonal matrix lowerCor

      calls cor with use=lsquopairwisersquo method=lsquopearsonrsquo as default values and returns (invisibly)the full correlation matrix and displays the lower off diagonal matrix

      gt lowerCor(satact)

      gendr edctn age ACT SATV SATQ

      gender 100

      education 009 100

      age -002 055 100

      ACT -004 015 011 100

      SATV -002 005 -004 056 100

      SATQ -017 003 -003 059 064 100

      When comparing results from two different groups it is convenient to display them as onematrix with the results from one group below the diagonal and the other group above thediagonal Use lowerUpper to do this

      gt female lt- subset(satactsatact$gender==2)

      gt male lt- subset(satactsatact$gender==1)

      gt lower lt- lowerCor(male[-1])

      edctn age ACT SATV SATQ

      education 100

      age 061 100

      ACT 016 015 100

      SATV 002 -006 061 100

      SATQ 008 004 060 068 100

      gt upper lt- lowerCor(female[-1])

      edctn age ACT SATV SATQ

      education 100

      age 052 100

      ACT 016 008 100

      SATV 007 -003 053 100

      SATQ 003 -009 058 063 100

      gt both lt- lowerUpper(lowerupper)

      gt round(both2)

      education age ACT SATV SATQ

      education NA 052 016 007 003

      age 061 NA 008 -003 -009

      ACT 016 015 NA 053 058

      SATV 002 -006 061 NA 063

      SATQ 008 004 060 068 NA

      It is also possible to compare two matrices by taking their differences and displaying one (be-low the diagonal) and the difference of the second from the first above the diagonal

      27

      gt diffs lt- lowerUpper(lowerupperdiff=TRUE)

      gt round(diffs2)

      education age ACT SATV SATQ

      education NA 009 000 -005 005

      age 061 NA 007 -003 013

      ACT 016 015 NA 008 002

      SATV 002 -006 061 NA 005

      SATQ 008 004 060 068 NA

      348 Heatmap displays of correlational structure

      Perhaps a better way to see the structure in a correlation matrix is to display a heat mapof the correlations This is just a matrix color coded to represent the magnitude of thecorrelation This is useful when considering the number of factors in a data set Considerthe Thurstone data set which has a clear 3 factor solution (Figure 11) or a simulated dataset of 24 variables with a circumplex structure (Figure 12) The color coding representsa ldquoheat maprdquo of the correlations with darker shades of red representing stronger negativeand darker shades of blue stronger positive correlations As an option the value of thecorrelation can be shown

      Yet another way to show structure is to use ldquospiderrdquo plots Particularly if variables areordered in some meaningful way (eg in a circumplex) a spider plot will show this structureeasily This is just a plot of the magnitude of the correlation as a radial line with lengthranging from 0 (for a correlation of -1) to 1 (for a correlation of 1) (See Figure 13)

      35 Testing correlations

      Correlations are wonderful descriptive statistics of the data but some people like to testwhether these correlations differ from zero or differ from each other The cortest func-tion (in the stats package) will test the significance of a single correlation and the rcorr

      function in the Hmisc package will do this for many correlations In the psych packagethe corrtest function reports the correlation (Pearson Spearman or Kendall) betweenall variables in either one or two data frames or matrices as well as the number of obser-vations for each case and the (two-tailed) probability for each correlation Unfortunatelythese probability values have not been corrected for multiple comparisons and so shouldbe taken with a great deal of salt Thus in corrtest and corrp the raw probabilitiesare reported below the diagonal and the probabilities adjusted for multiple comparisonsusing (by default) the Holm correction are reported above the diagonal (Table 1) (See thepadjust function for a discussion of Holm (1979) and other corrections)

      Testing the difference between any two correlations can be done using the rtest functionThe function actually does four different tests (based upon an article by Steiger (1980)

      28

      gt png(corplotpng)gt corPlot(Thurstonenumbers=TRUEupper=FALSEdiag=FALSEmain=9 cognitive variables from Thurstone)

      gt devoff()

      null device

      1

      Figure 11 The structure of correlation matrix can be seen more clearly if the variables aregrouped by factor and then the correlations are shown by color By using the rsquonumbersrsquooption the values are displayed as well By default the complete matrix is shown Settingupper=FALSE and diag=FALSE shows a cleaner figure

      29

      gt png(circplotpng)gt circ lt- simcirc(24)

      gt rcirc lt- cor(circ)

      gt corPlot(rcircmain=24 variables in a circumplex)gt devoff()

      null device

      1

      Figure 12 Using the corPlot function to show the correlations in a circumplex Correlationsare highest near the diagonal diminish to zero further from the diagonal and the increaseagain towards the corners of the matrix Circumplex structures are common in the studyof affect For circumplex structures it is perhaps useful to show the complete matrix

      30

      gt png(spiderpng)gt oplt- par(mfrow=c(22))

      gt spider(y=c(161218)x=124data=rcircfill=TRUEmain=Spider plot of 24 circumplex variables)

      gt op lt- par(mfrow=c(11))

      gt devoff()

      null device

      1

      Figure 13 A spider plot can show circumplex structure very clearly Circumplex structuresare common in the study of affect

      31

      Table 1 The corrtest function reports correlations cell sizes and raw and adjustedprobability values corrp reports the probability values for a correlation matrix Bydefault the adjustment used is that of Holm (1979)gt corrtest(satact)

      Callcorrtest(x = satact)

      Correlation matrix

      gender education age ACT SATV SATQ

      gender 100 009 -002 -004 -002 -017

      education 009 100 055 015 005 003

      age -002 055 100 011 -004 -003

      ACT -004 015 011 100 056 059

      SATV -002 005 -004 056 100 064

      SATQ -017 003 -003 059 064 100

      Sample Size

      gender education age ACT SATV SATQ

      gender 700 700 700 700 700 687

      education 700 700 700 700 700 687

      age 700 700 700 700 700 687

      ACT 700 700 700 700 700 687

      SATV 700 700 700 700 700 687

      SATQ 687 687 687 687 687 687

      Probability values (Entries above the diagonal are adjusted for multiple tests)

      gender education age ACT SATV SATQ

      gender 000 017 100 100 1 0

      education 002 000 000 000 1 1

      age 058 000 000 003 1 1

      ACT 033 000 000 000 0 0

      SATV 062 022 026 000 0 0

      SATQ 000 036 037 000 0 0

      To see confidence intervals of the correlations print with the short=FALSE option

      32

      depending upon the input

      1) For a sample size n find the t and p value for a single correlation as well as the confidenceinterval

      gt rtest(503)

      Correlation tests

      Callrtest(n = 50 r12 = 03)

      Test of significance of a correlation

      t value 218 with probability lt 0034

      and confidence interval 002 053

      2) For sample sizes of n and n2 (n2 = n if not specified) find the z of the difference betweenthe z transformed correlations divided by the standard error of the difference of two zscores

      gt rtest(3046)

      Correlation tests

      Callrtest(n = 30 r12 = 04 r34 = 06)

      Test of difference between two independent correlations

      z value 099 with probability 032

      3) For sample size n and correlations ra= r12 rb= r23 and r13 specified test for thedifference of two dependent correlations (Steiger case A)

      gt rtest(103451)

      Correlation tests

      Call[1] rtest(n = 103 r12 = 04 r23 = 01 r13 = 05 )

      Test of difference between two correlated correlations

      t value -089 with probability lt 037

      4) For sample size n test for the difference between two dependent correlations involvingdifferent variables (Steiger case B)

      gt rtest(103567558) steiger Case B

      Correlation tests

      Callrtest(n = 103 r12 = 05 r34 = 06 r23 = 07 r13 = 05 r14 = 05

      r24 = 08)

      Test of difference between two dependent correlations

      z value -12 with probability 023

      To test whether a matrix of correlations differs from what would be expected if the popu-lation correlations were all zero the function cortest follows Steiger (1980) who pointedout that the sum of the squared elements of a correlation matrix or the Fisher z scoreequivalents is distributed as chi square under the null hypothesis that the values are zero(ie elements of the identity matrix) This is particularly useful for examining whethercorrelations in a single matrix differ from zero or for comparing two matrices Althoughobvious cortest can be used to test whether the satact data matrix produces non-zerocorrelations (it does) This is a much more appropriate test when testing whether a residualmatrix differs from zero

      gt cortest(satact)

      33

      Tests of correlation matrices

      Callcortest(R1 = satact)

      Chi Square value 132542 with df = 15 with probability lt 18e-273

      36 Polychoric tetrachoric polyserial and biserial correlations

      The Pearson correlation of dichotomous data is also known as the φ coefficient If thedata eg ability items are thought to represent an underlying continuous although latentvariable the φ will underestimate the value of the Pearson applied to these latent variablesOne solution to this problem is to use the tetrachoric correlation which is based uponthe assumption of a bivariate normal distribution that has been cut at certain points Thedrawtetra function demonstrates the process (Figure 14) This is also shown in termsof dichotomizing the bivariate normal density function using the drawcor function (Fig-ure 15) A simple generalization of this to the case of the multiple cuts is the polychoric

      correlation

      Other estimated correlations based upon the assumption of bivariate normality with cutpoints include the biserial and polyserial correlation

      If the data are a mix of continuous polytomous and dichotomous variables the mixedcor

      function will calculate the appropriate mixture of Pearson polychoric tetrachoric biserialand polyserial correlations

      The correlation matrix resulting from a number of tetrachoric or polychoric correlationmatrix sometimes will not be positive semi-definite This will sometimes happen if thecorrelation matrix is formed by using pair-wise deletion of cases The corsmooth functionwill adjust the smallest eigen values of the correlation matrix to make them positive rescaleall of them to sum to the number of variables and produce aldquosmoothedrdquocorrelation matrixAn example of this problem is a data set of burt which probably had a typo in the originalcorrelation matrix Smoothing the matrix corrects this problem

      4 Multilevel modeling

      Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular use

      34

      gt drawtetra()

      minus3 minus2 minus1 0 1 2 3

      minus3

      minus2

      minus1

      01

      23

      Y rho = 05phi = 033

      X gt τY gt Τ

      X lt τY gt Τ

      X gt τY lt Τ

      X lt τY lt Τ

      x

      dnor

      m(x

      )

      X gt τ

      τ

      x1

      Y gt Τ

      Τ

      Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values

      35

      gt drawcor(expand=20cuts=c(00))

      xy

      z

      Bivariate density rho = 05

      Figure 15 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values It isfound (laboriously) by optimizing the fit of the bivariate normal for various values of thecorrelation to the observed cell frequencies

      36

      is the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups

      41 Decomposing data into within and between level correlations usingstatsBy

      There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler functionto give some of the basic descriptive statistics for two level models

      This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package

      rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)

      where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means

      42 Generating and displaying multilevel data

      withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a betweengroup correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data

      simmultilevel will generate simulated data with a multilevel structure

      The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary

      function specifying the variable of interest

      37

      Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )

      43 Factor analysis by groups

      Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output

      sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)

      faBy(sbnfactors=5) find the 5 factor solution for each education level

      5 Multiple Regression mediation moderation and set cor-relations

      The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices

      51 Multiple regression from data or correlation matrices

      The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstonecorrelation matrix and find the multiple correlation of the last five variables as a functionof the first 4

      gt setCor(y = 59x=14data=Thurstone)

      Call setCor(y = 59 x = 14 data = Thurstone)

      Multiple Regression from matrix input

      Beta weights

      FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

      Sentences 009 007 025 021 020

      Vocabulary 009 017 009 016 -002

      SentCompletion 002 005 004 021 008

      FirstLetters 058 045 021 008 031

      38

      Multiple R

      FourLetterWords Suffixes LetterSeries Pedigrees

      069 063 050 058

      LetterGroup

      048

      multiple R2

      FourLetterWords Suffixes LetterSeries Pedigrees

      048 040 025 034

      LetterGroup

      023

      Multiple Inflation Factor (VIF) = 1(1-SMC) =

      Sentences Vocabulary SentCompletion FirstLetters

      369 388 300 135

      Unweighted multiple R

      FourLetterWords Suffixes LetterSeries Pedigrees

      059 058 049 058

      LetterGroup

      045

      Unweighted multiple R2

      FourLetterWords Suffixes LetterSeries Pedigrees

      034 034 024 033

      LetterGroup

      020

      Various estimates of between set correlations

      Squared Canonical Correlations

      [1] 06280 01478 00076 00049

      Average squared canonical correlation = 02

      Cohens Set Correlation R2 = 069

      Unweighted correlation between the two sets = 073

      By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed

      gt sc lt- setCor(y = 59x=34data=Thurstonez=12)

      Call setCor(y = 59 x = 34 data = Thurstone z = 12)

      Multiple Regression from matrix input

      Beta weights

      FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

      SentCompletion 002 005 004 021 008

      FirstLetters 058 045 021 008 031

      Multiple R

      FourLetterWords Suffixes LetterSeries Pedigrees

      058 046 021 018

      LetterGroup

      030

      39

      multiple R2

      FourLetterWords Suffixes LetterSeries Pedigrees

      0331 0210 0043 0032

      LetterGroup

      0092

      Multiple Inflation Factor (VIF) = 1(1-SMC) =

      SentCompletion FirstLetters

      102 102

      Unweighted multiple R

      FourLetterWords Suffixes LetterSeries Pedigrees

      044 035 017 014

      LetterGroup

      026

      Unweighted multiple R2

      FourLetterWords Suffixes LetterSeries Pedigrees

      019 012 003 002

      LetterGroup

      007

      Various estimates of between set correlations

      Squared Canonical Correlations

      [1] 0405 0023

      Average squared canonical correlation = 021

      Cohens Set Correlation R2 = 042

      Unweighted correlation between the two sets = 048

      gt round(sc$residual2)

      FourLetterWords Suffixes LetterSeries Pedigrees

      FourLetterWords 052 011 009 006

      Suffixes 011 060 -001 001

      LetterSeries 009 -001 075 028

      Pedigrees 006 001 028 066

      LetterGroup 013 003 037 020

      LetterGroup

      FourLetterWords 013

      Suffixes 003

      LetterSeries 037

      Pedigrees 020

      LetterGroup 077

      52 Mediation and Moderation analysis

      Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping

      40

      Consider the example from Preacher and Hayes (2004) as analyzed using the mediate

      function and the subsequent graphic from mediatediagram The data are found in theexample for mediate

      Call mediate(y = SATIS x = THERAPY m = ATTRIB data = sobel)

      The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB

      Total Direct effect(c) of THERAPY on SATIS = 076 SE = 031 t direct = 25 with probability = 0019

      Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t direct = 135 with probability = 019

      Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033

      Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 004 Upper CI = 069

      R2 of model = 031

      To see the longer output specify short = FALSE in the print statement

      Full output

      Total effect estimates (c)

      SATIS se t Prob

      THERAPY 076 031 25 00186

      Direct effect estimates (c)SATIS se t Prob

      THERAPY 043 032 135 0190

      ATTRIB 040 018 223 0034

      a effect estimates

      THERAPY se t Prob

      ATTRIB 082 03 274 00106

      b effect estimates

      SATIS se t Prob

      ATTRIB 04 018 223 0034

      ab effect estimates

      SATIS boot sd lower upper

      THERAPY 033 032 017 004 069

      bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables

      setCor(y = c( SATV SATQ) x = c(education age ) data = satact std=TRUE)

      bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap

      mediate(y = c( SATV ) x = c(education age ) m= ACT data =satactstd=TRUEniter=50)

      bull mediate will take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It then tests the mediation effect using a bootstrap The particular example is for demonstration purposes only and shows neithermoderation nor mediation The number of iterations for the boot strap was set to 50

      41

      gt mediatediagram(preacher)

      Mediation model

      THERAPY SATIS

      ATTRIB

      082

      c = 076

      c = 043

      04

      Figure 16 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram

      42

      gt preacher lt- setCor(1c(23)sobelstd=FALSE)

      gt setCordiagram(preacher)

      Regression Models

      THERAPY

      ATTRIB

      SATIS

      043

      04

      021

      Figure 17 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure

      43

      for speed The default number of boot straps is 5000

      53 Set Correlation

      An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor

      function Set correlation is

      R2 = 1minusn

      prodi=1

      (1minusλi)

      where λi is the ith eigen value of the eigen value decomposition of the matrix

      R = Rminus1xx RxyRminus1

      xx Rminus1xy

      Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate

      setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data

      Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things tonotice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied

      For this example the analysis is done on the correlation matrix rather than the rawdata

      gt C lt- cov(satactuse=pairwise)

      gt model1 lt- lm(ACT~ gender + education + age data=satact)

      gt summary(model1)

      Call

      lm(formula = ACT ~ gender + education + age data = satact)

      Residuals

      44

      Call mediate(y = c(SATQ) x = c(ACT) m = education data = satact

      mod = gender niter = 50 std = TRUE)

      The DV (Y) was SATQ The IV (X) was ACT gender ACTXgndr The mediating variable(s) = education

      Total Direct effect(c) of ACT on SATQ = 058 SE = 003 t direct = 1925 with probability = 0

      Direct effect (c) of ACT on SATQ removing education = 059 SE = 003 t direct = 1926 with probability = 0

      Indirect effect (ab) of ACT on SATQ through education = -001

      Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -002 Upper CI = 0

      Total Direct effect(c) of gender on SATQ = -014 SE = 003 t direct = -478 with probability = 21e-06

      Direct effect (c) of gender on NA removing education = -014 SE = 003 t direct = -463 with probability = 44e-06

      Indirect effect (ab) of gender on SATQ through education = 0

      Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -001 Upper CI = 0

      Total Direct effect(c) of ACTXgndr on SATQ = 0 SE = 003 t direct = 002 with probability = 099

      Direct effect (c) of ACTXgndr on NA removing education = 0 SE = 003 t direct = 001 with probability = 099

      Indirect effect (ab) of ACTXgndr on SATQ through education = 0

      Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = 0 Upper CI = 0

      R2 of model = 037

      To see the longer output specify short = FALSE in the print statement

      Full output

      Total effect estimates (c)

      SATQ se t Prob

      ACT 058 003 1925 000e+00

      gender -014 003 -478 210e-06

      ACTXgndr 000 003 002 985e-01

      Direct effect estimates (c)SATQ se t Prob

      ACT 059 003 1926 000e+00

      gender -014 003 -463 437e-06

      ACTXgndr 000 003 001 992e-01

      a effect estimates

      education se t Prob

      ACT 016 004 422 277e-05

      gender 009 004 250 128e-02

      ACTXgndr -001 004 -015 883e-01

      b effect estimates

      SATQ se t Prob

      education -004 003 -145 0147

      ab effect estimates

      SATQ boot sd lower upper

      ACT -001 -001 001 0 0

      gender 000 000 000 0 0

      ACTXgndr 000 000 000 0 0

      Moderation model

      ACT

      gender

      ACTXgndr

      SATQ

      education016 c = 058

      c = 059

      009 c = minus014

      c = minus014

      minus001 c = 0

      c = 0

      minus004

      minus004

      minus007

      002

      Figure 18 Moderated multiple regression requires the raw data

      45

      Min 1Q Median 3Q Max

      -252458 -32133 07769 35921 92630

      Coefficients

      Estimate Std Error t value Pr(gt|t|)

      (Intercept) 2741706 082140 33378 lt 2e-16

      gender -048606 037984 -1280 020110

      education 047890 015235 3143 000174

      age 001623 002278 0712 047650

      ---

      Signif codes 0 0001 001 005 01 1

      Residual standard error 4768 on 696 degrees of freedom

      Multiple R-squared 00272 Adjusted R-squared 002301

      F-statistic 6487 on 3 and 696 DF p-value 00002476

      Compare this with the output from setCor

      gt compare with sector

      gt setCor(c(46)c(13)C nobs=700)

      Call setCor(y = c(46) x = c(13) data = C nobs = 700)

      Multiple Regression from matrix input

      Beta weights

      ACT SATV SATQ

      gender -005 -003 -018

      education 014 010 010

      age 003 -010 -009

      Multiple R

      ACT SATV SATQ

      016 010 019

      multiple R2

      ACT SATV SATQ

      00272 00096 00359

      Multiple Inflation Factor (VIF) = 1(1-SMC) =

      gender education age

      101 145 144

      Unweighted multiple R

      ACT SATV SATQ

      015 005 011

      Unweighted multiple R2

      ACT SATV SATQ

      002 000 001

      SE of Beta weights

      ACT SATV SATQ

      gender 018 429 434

      education 022 513 518

      age 022 511 516

      t of Beta Weights

      ACT SATV SATQ

      gender -027 -001 -004

      education 065 002 002

      46

      age 015 -002 -002

      Probability of t lt

      ACT SATV SATQ

      gender 079 099 097

      education 051 098 098

      age 088 098 099

      Shrunken R2

      ACT SATV SATQ

      00230 00054 00317

      Standard Error of R2

      ACT SATV SATQ

      00120 00073 00137

      F

      ACT SATV SATQ

      649 226 863

      Probability of F lt

      ACT SATV SATQ

      248e-04 808e-02 124e-05

      degrees of freedom of regression

      [1] 3 696

      Various estimates of between set correlations

      Squared Canonical Correlations

      [1] 0050 0033 0008

      Chisq of canonical correlations

      [1] 358 231 56

      Average squared canonical correlation = 003

      Cohens Set Correlation R2 = 009

      Shrunken Set Correlation R2 = 008

      F and df of Cohens Set Correlation 726 9 168186

      Unweighted correlation between the two sets = 001

      Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

      6 Converting output to APA style tables using LATEX

      Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenient

      47

      LATEXoutput and finally df2latex converts a generic data frame to LATEX

      An example of converting the output from fa to LATEXappears in Table 2

      Table 2 fa2latexA factor analysis table from the psych package in R

      Variable MR1 MR2 MR3 h2 u2 com

      Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

      SS loadings 264 186 15

      MR1 100 059 054MR2 059 100 052MR3 054 052 100

      48

      7 Miscellaneous functions

      A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

      blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

      df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

      scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

      cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

      cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

      dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

      fisherz Convert a correlation to the corresponding Fisher z score

      geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

      ICC and cohenkappa are typically used to find the reliability for raters

      headtail combines the head and tail functions to show the first and last lines of a dataset or output

      topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

      mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

      prep finds the probability of replication for an F t or r and estimate effect size

      partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

      rangeCorrection will correct correlations for restriction of range

      reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

      49

      superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

      8 Data sets

      A number of data sets for demonstrating psychometric techniques are included in thepsych package These include six data sets showing a hierarchical factor structure (fivecognitive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2and one from health psychology Reise) One of these (Thurstone) is used as an examplein the sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) or 13 personality inventory scores (epibfi) and14 multiple choice iq items (iqitems) The vegetables example has paired comparisonpreferences for 9 vegetables This is an example of Thurstonian scaling used by Guilford(1954) and Nunnally (1967) Other data sets include cubits peas and heights fromGalton

      Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

      bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

      satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

      epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

      50

      iq 14 multiple choice ability items were included as part of the Synthetic Aperture Person-ality Assessment (SAPA) web based personality assessment project The data from1000 subjects are included here as a demonstration set for scoring multiple choiceinventories and doing basic item statistics

      galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to ntroduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

      Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

      miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

      9 Development version and a users guide

      The most recent development version is available as a source file at the repository main-tained at httppersonality-projectorgr That version will have removed the mostrecently discovered bugs (but perhaps introduced other yet to be discovered ones) Todownload that version go to the repository httppersonality-projectorgrsrc

      contrib and wander around For a Mac this version can be installed directly using theldquoother repositoryrdquo option in the package installer For a PC the zip file for the most recentrelease has been created using the win-builder facility at CRAN The development releasefor the Mac is usually several weeks ahead of the PC development version

      Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httppersonality-projectorgrpsych_manualpdf

      News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

      gt news(Version gt 170package=psych)

      51

      10 Psychometric Theory

      The psych package has been developed to help psychologists do basic research Many ofthe functions were developed to supplement a book (httppersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep)More information about the use of some of the functions may be found in the book

      For more extensive discussion of the use of psych in particular and R in general consulthttppersonality-projectorgrrguidehtml A short guide to R

      11 SessionInfo

      This document was prepared using the following settings

      gt sessionInfo()

      R Under development (unstable) (2017-03-05 r72309)

      Platform x86_64-apple-darwin1340 (64-bit)

      Running under macOS Sierra 10124

      Matrix products default

      BLAS LibraryFrameworksRframeworkVersions34ResourcesliblibRblas0dylib

      LAPACK LibraryFrameworksRframeworkVersions34ResourcesliblibRlapackdylib

      locale

      [1] C

      attached base packages

      [1] stats graphics grDevices utils datasets methods base

      other attached packages

      [1] psych_17421

      loaded via a namespace (and not attached)

      [1] compiler_340 parallel_340 tools_340 foreign_08-67

      [5] KernSmooth_223-15 nlme_31-131 mnormt_15-4 grid_340

      [9] lattice_020-34

      52

      References

      Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

      Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

      Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

      Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

      Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

      Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

      Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

      Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

      Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

      Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

      Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

      Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

      Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

      Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

      Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

      53

      Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

      Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

      Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

      Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

      Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

      Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

      Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

      Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

      Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

      Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

      MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

      Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

      McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

      Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

      Nunnally J C (1967) Psychometric theory McGraw-Hill New York

      54

      Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

      3rd edition

      Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

      Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

      Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

      Revelle W (2015) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 158

      Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

      Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

      Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

      Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

      Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook ofIndividual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

      Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

      Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

      Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

      Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bang

      55

      for the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

      Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

      Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

      Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

      Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

      Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

      Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

      Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

      Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

      Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

      Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

      Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

      56

      Index

      affect 14 24alpha 5 6

      Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26bifactor 6biserial 13 34blockrandom 49burt 34

      char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49circular statistics 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49ctv 7cubits 50

      densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49diagram 8drawcor 34drawtetra 34dummycode 13

      dynamite plot 19

      edit 3epibfi 50error bars 19errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23

      fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factor analysis 6factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49

      galton 51generalized least squares 6geometricmean 49GPArotation 7guttman 6

      harmonicmean 49head 49headtail 49heights 50hetdiagram 7Hmisc 28Holzinger 50

      57

      ICC 6 49iclust 6iclustdiagram 7Index 49introduction to psychometric theory with ap-

      plications in R 7iqitems 50irtfa 6 47irt2latex 47 49

      KnitR 47

      lavaan 38library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27lowess 14

      makekeys 14MAP 6mardia 49maximum likelihood 6mediate 4 41 42mediatediagram 41minimum residual 6mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevel 37multilevelreliability 6multiple regression 38

      nfactors 6nlme 37

      omega 6 7outlier 3 11 12

      padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7principal axis 6psych 3 5ndash8 28 47 49ndash52

      R functionaffect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50

      58

      densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13edit 3epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49head 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47

      irt2latex 47 49library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7psych 51psych package

      affect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26

      59

      biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49corsmooth 34cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7

      fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47irt2latex 47 49lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12prep 49pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34

      60

      polyserial 34principal 5ndash7psych 51rtest 28rangeCorrection 49readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

      rtest 28

      rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50table 19tail 49tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

      R package

      61

      ctv 7GPArotation 7Hmisc 28KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash8 28 47 49ndash52Rgraphviz 8sem 7 50stats 28Sweave 47xtable 47

      rtest 28rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49Rgraphviz 8

      SAPA 26 50 51satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43sem 7 50set correlation 44setcor 49setCor 4 38 41 44 46 47simmultilevel 37

      spider 13stars 14stats 28StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50Sweave 47

      table 19tail 49tetrachoric 6 34Thurstone 28 38 50Thurstone33 50topBottom 49

      vegetables 50 51violinBy 14 18vss 5 6

      weighted least squares 6withinBetween 37

      xtable 47

      62

      • Jump starting the psych packagendasha guide for the impatient
      • Psychometric functions are summarized in the second vignette
      • Overview of this and related documents
      • Getting started
      • Basic data analysis
        • Getting the data by using readfile
        • Data input from the clipboard
        • Basic descriptive statistics
          • Outlier detection using outlier
          • Basic data cleaning using scrub
          • Recoding categorical variables into dummy coded variables
            • Simple descriptive graphics
              • Scatter Plot Matrices
              • Density or violin plots
              • Means and error bars
              • Error bars for tabular data
              • Two dimensional displays of means and errors
              • Back to back histograms
              • Correlational structure
              • Heatmap displays of correlational structure
                • Testing correlations
                • Polychoric tetrachoric polyserial and biserial correlations
                  • Multilevel modeling
                    • Decomposing data into within and between level correlations using statsBy
                    • Generating and displaying multilevel data
                    • Factor analysis by groups
                      • Multiple Regression mediation moderation and set correlations
                        • Multiple regression from data or correlation matrices
                        • Mediation and Moderation analysis
                        • Set Correlation
                          • Converting output to APA style tables using LaTeX
                          • Miscellaneous functions
                          • Data sets
                          • Development version and a users guide
                          • Psychometric Theory
                          • SessionInfo

        7 Find the correlations of all of your data lowerCor will by default find the pairwisecorrelations round them to 2 decimals and display the lower off diagonal matrix

        bull Descriptively (just the values) (section 347)

        r lt- lowerCor(myData) The correlation matrix rounded to 2 decimals

        bull Graphically (section 348) Another way is to show a heat map of the correla-tions with the correlation values included

        corPlot(r) examine the many options for this function

        bull Inferentially (the values the ns and the p values) (section 35)

        corrtest(myData)

        8 Apply various regression models

        Several functions are meant to do multiple regressions either from the raw data orfrom a variancecovariance matrix or a correlation matrix

        bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables

        myData lt- satact

        colnames(myData) lt- c(mod1med1x1x2y1y2)

        setCor(y = c( y1 y2) x = c(x1x2) data = myData)

        bull mediate will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables mediatedthrough a mediation variable It then tests the mediation effect using a bootstrap

        mediate(y = c( y1 y2) x = c(x1x2) m= med1 data = myData)

        bull mediate will take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple xvariables mediated through a mediation variable It then tests the mediationeffect using a boot strap

        mediate(y = c( y1 y2) x = c(x1x2) m= med1 mod = mod1 data = myData)

        02 Psychometric functions are summarized in the second vignette

        Many additional functions particularly designed for basic and advanced psychomet-rics are discussed more fully in the Overview Vignette A brief review of the functionsavailable is included here In addition there are helpful tutorials for Finding omegaHow to score scales and find reliability and for Using psych for factor analysis athttppersonality-projectorgr

        4

        bull Test for the number of factors in your data using parallel analysis (faparallelsection ) or Very Simple Structure (vss )

        faparallel(myData)

        vss(myData)

        bull Factor analyze (see section ) the data with a specified number of factors(the default is 1) the default method is minimum residual the default rotationfor more than one factor is oblimin There are many more possibilities (seesections -) Compare the solution to a hierarchical cluster analysis using theICLUST algorithm (Revelle 1979) (see section ) Also consider a hierarchicalfactor solution to find coefficient ω (see )

        fa(myData)

        iclust(myData)

        omega(myData)

        If you prefer to do a principal components analysis you may use the principalfunction The default is one component

        principal(myData)

        bull Some people like to find coefficient α as an estimate of reliability This may bedone for a single scale using the alpha function (see ) Perhaps more usefulis the ability to create several scales as unweighted averages of specified itemsusing the scoreItems function (see ) and to find various estimates of internalconsistency for these scales find their intercorrelations and find scores for allthe subjects

        alpha(myData) score all of the items as part of one scale

        myKeys lt- makekeys(nvar=20list(first = c(1-35-7810)second=c(24-61115-16)))

        myscores lt- scoreItems(myKeysmyData) form several scales

        myscores show the highlights of the results

        At this point you have had a chance to see the highlights of the psych package and to dosome basic (and advanced) data analysis You might find reading this entire vignette aswell as the Overview Vignette to be helpful to get a broader understanding of what can bedone in R using the psych Remember that the help command () is available for everyfunction Try running the examples for each help page

        5

        1 Overview of this and related documents

        The psych package (Revelle 2015) has been developed at Northwestern University since2005 to include functions most useful for personality psychometric and psychological re-search The package is also meant to supplement a text on psychometric theory (Revelleprep) a draft of which is available at httppersonality-projectorgrbook

        Some of the functions (eg readfile readclipboard describe pairspanels scat-terhist errorbars multihist bibars) are useful for basic data entry and descrip-tive analyses

        Psychometric applications emphasize techniques for dimension reduction including factoranalysis cluster analysis and principal components analysis The fa function includesfive methods of factor analysis (minimum residual principal axis weighted least squaresgeneralized least squares and maximum likelihood factor analysis) Principal ComponentsAnalysis (PCA) is also available through the use of the principal or pca functions De-termining the number of factors or components to extract may be done by using the VerySimple Structure (Revelle and Rocklin 1979) (vss) Minimum Average Partial correlation(Velicer 1976) (MAP) or parallel analysis (faparallel) criteria These and several othercriteria are included in the nfactors function Two parameter Item Response Theory(IRT) models for dichotomous or polytomous items may be found by factoring tetra-

        choric or polychoric correlation matrices and expressing the resulting parameters interms of location and discrimination using irtfa

        Bifactor and hierarchical factor structures may be estimated by using Schmid Leimantransformations (Schmid and Leiman 1957) (schmid) to transform a hierarchical factorstructure into a bifactor solution (Holzinger and Swineford 1937) Higher order modelscan also be found using famulti

        Scale construction can be done using the Item Cluster Analysis (Revelle 1979) (iclust)function to determine the structure and to calculate reliability coefficients α (Cronbach1951)(alpha scoreItems scoremultiplechoice) β (Revelle 1979 Revelle and Zin-barg 2009) (iclust) and McDonaldrsquos ωh and ωt (McDonald 1999) (omega) Guttmanrsquos sixestimates of internal consistency reliability (Guttman (1945) as well as additional estimates(Revelle and Zinbarg 2009) are in the guttman function The six measures of Intraclasscorrelation coefficients (ICC) discussed by Shrout and Fleiss (1979) are also available

        For data with a a multilevel structure (eg items within subjects across time or itemswithin subjects across groups) the describeBy statsBy functions will give basic descrip-tives by group StatsBy also will find within group (or subject) correlations as well as thebetween group correlation

        multilevelreliability mlr will find various generalizability statistics for subjects over

        6

        time and items mlPlot will graph items over for each subject mlArrange converts widedata frames to long data frames suitable for multilevel modeling

        Graphical displays include Scatter Plot Matrix (SPLOM) plots using pairspanels cor-relation ldquoheat mapsrdquo (corPlot) factor cluster and structural diagrams using fadiagramiclustdiagram structurediagram and hetdiagram as well as item response charac-teristics and item and test information characteristic curves plotirt and plotpoly

        This vignette is meant to give an overview of the psych package That is it is meantto give a summary of the main functions in the psych package with examples of howthey are used for data description dimension reduction and scale construction The ex-tended user manual at psych_manualpdf includes examples of graphic output and moreextensive demonstrations than are found in the help menus (Also available at http

        personality-projectorgrpsych_manualpdf) The vignette psych for sem atpsych_for_sempdf discusses how to use psych as a front end to the sem package of JohnFox (Fox et al 2012) (The vignette is also available at httppersonality-project

        orgrbookpsych_for_sempdf)

        For a step by step tutorial in the use of the psych package and the base functions inR for basic personality research see the guide for using R for personality research athttppersonalitytheoryorgrrshorthtml For an introduction to psychometrictheory with applications in R see the draft chapters at httppersonality-project

        orgrbook)

        2 Getting started

        Some of the functions described in the Overview Vignette require other packages This isnot the case for the functions listed in this Introduction Particularly useful for rotatingthe results of factor analyses (from eg fa factorminres factorpa factorwlsor principal) or hierarchical factor models using omega or schmid is the GPArotationpackage These and other useful packages may be installed by first installing and thenusing the task views (ctv) package to install the ldquoPsychometricsrdquo task view but doing itthis way is not necessary

        installpackages(ctv)

        library(ctv)

        taskviews(Psychometrics)

        The ldquoPsychometricsrdquo task view will install a large number of useful packages To installthe bare minimum for the examples in this vignette it is necessary to install just 3 pack-ages

        7

        installpackages(list(c(GPArotationmnormt)

        Because of the difficulty of installing the package Rgraphviz alternative graphics have beendeveloped and are available as diagram functions If Rgraphviz is available some functionswill take advantage of it An alternative is to useldquodotrdquooutput of commands for any externalgraphics package that uses the dot language

        3 Basic data analysis

        A number of psych functions facilitate the entry of data and finding basic descriptivestatistics

        Remember to run any of the psych functions it is necessary to make the package activeby using the library command

        library(psych)

        The other packages once installed will be called automatically by psych

        It is possible to automatically load psych and other functions by creating and then savinga ldquoFirstrdquo function eg

        First lt- function(x) library(psych)

        31 Getting the data by using readfile

        Although many find copying the data to the clipboard and then using the readclipboardfunctions (see below) a helpful alternative is to read the data in directly This can be doneusing the readfile function which calls filechoose to find the file and then based uponthe suffix of the file chooses the appropriate way to read it For files with suffixes of txttext r rds rda csv xpt or sav the file will be read correctly

        mydata lt- readfile()

        If the file contains Fixed Width Format (fwf) data the column information can be specifiedwith the widths command

        mydata lt- readfile(widths = c(4rep(135)) will read in a file without a header row and 36 fields the first of which is 4 colums the rest of which are 1 column each

        If the file is a RData file (with suffix of RData Rda rda Rdata or rdata) the objectwill be loaded Depending what was stored this might be several objects If the file is asav file from SPSS it will be read with the most useful default options (converting the fileto a dataframe and converting character fields to numeric) Alternative options may bespecified If it is an export file from SAS (xpt or XPT) it will be read csv files (comma

        8

        separated files) normal txt or text files data or dat files will be read as well These areassumed to have a header row of variable labels (header=TRUE) If the data do not havea header row you must specify readfile(header=FALSE)

        To read SPSS files and to keep the value labels specify usevaluelabels=TRUE

        myspss lt- readfile(usevaluelabels=TRUE) this will keep the value labels for sav files

        32 Data input from the clipboard

        There are of course many ways to enter data into R Reading from a local file usingreadtable is perhaps the most preferred However many users will enter their datain a text editor or spreadsheet program and then want to copy and paste into R Thismay be done by using readtable and specifying the input file as ldquoclipboardrdquo (PCs) orldquopipe(pbpaste)rdquo (Macs) Alternatively the readclipboard set of functions are perhapsmore user friendly

        readclipboard is the base function for reading data from the clipboard

        readclipboardcsv for reading text that is comma delimited

        readclipboardtab for reading text that is tab delimited (eg copied directly from anExcel file)

        readclipboardlower for reading input of a lower triangular matrix with or without adiagonal The resulting object is a square matrix

        readclipboardupper for reading input of an upper triangular matrix

        readclipboardfwf for reading in fixed width fields (some very old data sets)

        For example given a data set copied to the clipboard from a spreadsheet just enter thecommand

        mydata lt- readclipboard()

        This will work if every data field has a value and even missing data are given some values(eg NA or -999) If the data were entered in a spreadsheet and the missing valueswere just empty cells then the data should be read in as a tab delimited or by using thereadclipboardtab function

        gt mydata lt- readclipboard(sep=t) define the tab option or

        gt mytabdata lt- readclipboardtab() just use the alternative function

        For the case of data in fixed width fields (some old data sets tend to have this format)copy to the clipboard and then specify the width of each field (in the example below the

        9

        first variable is 5 columns the second is 2 columns the next 5 are 1 column the last 4 are3 columns)

        gt mydata lt- readclipboardfwf(widths=c(52rep(15)rep(34))

        33 Basic descriptive statistics

        Once the data are read in then describe or describeBy will provide basic descriptivestatistics arranged in a data frame format Consider the data set satact which in-cludes data from 700 web based participants on 3 demographic variables and 3 abilitymeasures

        describe reports means standard deviations medians min max range skew kurtosisand standard errors for integer or real data Non-numeric data although the statisticsare meaningless will be treated as if numeric (based upon the categorical coding ofthe data) and will be flagged with an

        describeBy reports descriptive statistics broken down by some categorizing variable (eggender age etc)

        gt library(psych)

        gt data(satact)

        gt describe(satact) basic descriptive statistics

        vars n mean sd median trimmed mad min max range skew

        gender 1 700 165 048 2 168 000 1 2 1 -061

        education 2 700 316 143 3 331 148 0 5 5 -068

        age 3 700 2559 950 22 2386 593 13 65 52 164

        ACT 4 700 2855 482 29 2884 445 3 36 33 -066

        SATV 5 700 61223 11290 620 61945 11861 200 800 600 -064

        SATQ 6 687 61022 11564 620 61725 11861 200 800 600 -059

        kurtosis se

        gender -162 002

        education -007 005

        age 242 036

        ACT 053 018

        SATV 033 427

        SATQ -002 441

        These data may then be analyzed by groups defined in a logical statement or by some othervariable Eg break down the descriptive data for males or females These descriptivedata can also be seen graphically using the errorbarsby function (Figure 6) By settingskew=FALSE and ranges=FALSE the output is limited to the most basic statistics

        gt basic descriptive statistics by a grouping variable

        gt describeBy(satactsatact$genderskew=FALSEranges=FALSE)

        Descriptive statistics by group

        group 1

        vars n mean sd se

        gender 1 247 100 000 000

        10

        education 2 247 300 154 010

        age 3 247 2586 974 062

        ACT 4 247 2879 506 032

        SATV 5 247 61511 11416 726

        SATQ 6 245 63587 11602 741

        ------------------------------------------------------------

        group 2

        vars n mean sd se

        gender 1 453 200 000 000

        education 2 453 326 135 006

        age 3 453 2545 937 044

        ACT 4 453 2842 469 022

        SATV 5 453 61066 11231 528

        SATQ 6 442 59600 11307 538

        The output from the describeBy function can be forced into a matrix form for easy analysisby other programs In addition describeBy can group by several grouping variables at thesame time

        gt samat lt- describeBy(satactlist(satact$gendersatact$education)

        + skew=FALSEranges=FALSEmat=TRUE)

        gt headTail(samat)

        item group1 group2 vars n mean sd se

        gender1 1 1 0 1 27 1 0 0

        gender2 2 2 0 1 30 2 0 0

        gender3 3 1 1 1 20 1 0 0

        gender4 4 2 1 1 25 2 0 0

        ltNAgt ltNAgt ltNAgt

        SATQ9 69 1 4 6 51 6359 10412 1458

        SATQ10 70 2 4 6 86 59759 10624 1146

        SATQ11 71 1 5 6 46 65783 8961 1321

        SATQ12 72 2 5 6 93 60672 10555 1095

        331 Outlier detection using outlier

        One way to detect unusual data is to consider how far each data point is from the mul-tivariate centroid of the data That is find the squared Mahalanobis distance for eachdata point and then compare these to the expected values of χ2 This produces a Q-Q(quantle-quantile) plot with the n most extreme data points labeled (Figure 1) The outliervalues are in the vector d2

        332 Basic data cleaning using scrub

        If after describing the data it is apparent that there were data entry errors that need tobe globally replaced with NA or only certain ranges of data will be analyzed the data canbe ldquocleanedrdquo using the scrub function

        Consider a data set of 10 rows of 12 columns with values from 1 - 120 All values of columns

        11

        gt png( outlierpng )

        gt d2 lt- outlier(satactcex=8)

        gt devoff()

        null device

        1

        Figure 1 Using the outlier function to graphically show outliers The y axis is theMahalanobis D2 the X axis is the distribution of χ2 for the same number of degrees offreedom The outliers detected here may be shown graphically using pairspanels (see2 and may be found by sorting d2

        12

        3 - 5 that are less than 30 40 or 50 respectively or greater than 70 in any of the threecolumns will be replaced with NA In addition any value exactly equal to 45 will be setto NA (max and isvalue are set to one value here but they could be a different value forevery column)

        gt x lt- matrix(1120ncol=10byrow=TRUE)

        gt colnames(x) lt- paste(V110sep=)gt newx lt- scrub(x35min=c(304050)max=70isvalue=45newvalue=NA)

        gt newx

        V1 V2 V3 V4 V5 V6 V7 V8 V9 V10

        [1] 1 2 NA NA NA 6 7 8 9 10

        [2] 11 12 NA NA NA 16 17 18 19 20

        [3] 21 22 NA NA NA 26 27 28 29 30

        [4] 31 32 33 NA NA 36 37 38 39 40

        [5] 41 42 43 44 NA 46 47 48 49 50

        [6] 51 52 53 54 55 56 57 58 59 60

        [7] 61 62 63 64 65 66 67 68 69 70

        [8] 71 72 NA NA NA 76 77 78 79 80

        [9] 81 82 NA NA NA 86 87 88 89 90

        [10] 91 92 NA NA NA 96 97 98 99 100

        [11] 101 102 NA NA NA 106 107 108 109 110

        [12] 111 112 NA NA NA 116 117 118 119 120

        Note that the number of subjects for those columns has decreased and the minimums havegone up but the maximums down Data cleaning and examination for outliers should be aroutine part of any data analysis

        333 Recoding categorical variables into dummy coded variables

        Sometimes categorical variables (eg college major occupation ethnicity) are to be ana-lyzed using correlation or regression To do this one can form ldquodummy codesrdquo which aremerely binary variables for each category This may be done using dummycode Subse-quent analyses using these dummy coded variables may be using biserial or point biserial(regular Pearson r) to show effect sizes and may be plotted in eg spider plots

        Alternatively sometimes data were coded originally as categorical (MaleFemale HighSchool some College in college etc) and you want to convert these columns of data tonumeric This is done by char2numeric

        34 Simple descriptive graphics

        Graphic descriptions of data are very helpful both for understanding the data as well ascommunicating important results Scatter Plot Matrices (SPLOMS) using the pairspanelsfunction are useful ways to look for strange effects involving outliers and non-linearitieserrorbarsby will show group means with 95 confidence boundaries By default er-rorbarsby and errorbars will show ldquocats eyesrdquo to graphically show the confidence

        13

        limits (Figure 6) This may be turned off by specifying eyes=FALSE densityBy or vio-

        linBy may be used to show the distribution of the data in ldquoviolinrdquo plots (Figure 5) (Theseare sometimes called ldquolava-lamprdquo plots)

        341 Scatter Plot Matrices

        Scatter Plot Matrices (SPLOMS) are very useful for describing the data The pairspanelsfunction adapted from the help menu for the pairs function produces xy scatter plots ofeach pair of variables below the diagonal shows the histogram of each variable on thediagonal and shows the lowess locally fit regression line as well An ellipse around themean with the axis length reflecting one standard deviation of the x and y variables is alsodrawn The x axis in each scatter plot represents the column variable the y axis the rowvariable (Figure 2) When plotting many subjects it is both faster and cleaner to set theplot character (pch) to be rsquorsquo (See Figure 2 for an example)

        pairspanels will show the pairwise scatter plots of all the variables as well as his-tograms locally smoothed regressions and the Pearson correlation When plottingmany data points (as in the case of the satact data it is possible to specify that theplot character is a period to get a somewhat cleaner graphic However in this figureto show the outliers we use colors and a larger plot character If we want to indicatersquosignificancersquo of the correlations by the conventional use of rsquomagic astricksrsquo we can setthe stars=TRUE option

        Another example of pairspanels is to show differences between experimental groupsConsider the data in the affect data set The scores reflect post test scores on positiveand negative affect and energetic and tense arousal The colors show the results for fourmovie conditions depressing frightening movie neutral and a comedy

        Yet another demonstration of pairspanels is useful when you have many subjects andwant to show the density of the distributions To do this we will use the makekeys

        and scoreItems functions (discussed in the second vignette) to create scales measuringEnergetic Arousal Tense Arousal Positive Affect and Negative Affect (see the msq helpfile) We then show a pairspanels scatter plot matrix where we smooth the data pointsand show the density of the distribution by color

        342 Density or violin plots

        Graphical presentation of data may be shown using box plots to show the median and 25thand 75th percentiles A powerful alternative is to show the density distribution using theviolinBy function (Figure 5)

        14

        gt png( pairspanelspng )

        gt satd2 lt- dataframe(satactd2) combine the d2 statistics from before with the satact dataframe

        gt pairspanels(satd2bg=c(yellowblue)[(d2 gt 25)+1]pch=21stars=TRUE)

        gt devoff()

        null device

        1

        Figure 2 Using the pairspanels function to graphically show relationships The x axisin each scatter plot represents the column variable the y axis the row variable Note theextreme outlier for the ACT If the plot character were set to a period (pch=rsquorsquo) it wouldmake a cleaner graphic but in to show the outliers in color we use the plot characters 21and 22

        15

        gt png(affectpng)gt pairspanels(affect[1417]bg=c(redblackwhiteblue)[affect$Film]pch=21

        + main=Affect varies by movies )

        gt devoff()

        null device

        1

        Figure 3 Using the pairspanels function to graphically show relationships The x axis ineach scatter plot represents the column variable the y axis the row variable The coloringrepresent four different movie conditions

        16

        gt keys lt- makekeys(msq[175]list(

        + EA = c(active energetic vigorous wakeful wideawake fullofpep

        + lively -sleepy -tired -drowsy)

        + TA =c(intense jittery fearful tense clutchedup -quiet -still

        + -placid -calm -atrest)

        + PA =c(active excited strong inspired determined attentive

        + interested enthusiastic proud alert)

        + NAf =c(jittery nervous scared afraid guilty ashamed distressed

        + upset hostile irritable )) )

        gt scores lt- scoreItems(keysmsq[175])

        gt png(msqpng)gt pairspanels(scores$scoressmoother=TRUE

        + main =Density distributions of four measures of affect )

        gt devoff()

        null device

        1

        Figure 4 Using the pairspanels function to graphically show relationships The x axis ineach scatter plot represents the column variable the y axis the row variable The variablesare four measures of motivational state for 3896 participants Each scale is the averagescore of 10 items measuring motivational state Compare this a plot with smoother set toFALSE

        17

        gt data(satact)

        gt violinBy(satact[56]satact$gendergrpname=c(M F)main=Density Plot by gender for SAT V and Q)

        Density Plot by gender for SAT V and Q

        Obs

        erve

        d

        SATV M SATV F SATQ M SATQ F

        200

        300

        400

        500

        600

        700

        800

        Figure 5 Using the violinBy function to show the distribution of SAT V and Q for malesand females The plot shows the medians and 25th and 75th percentiles as well as theentire range and the density distribution

        18

        343 Means and error bars

        Additional descriptive graphics include the ability to draw error bars on sets of data aswell as to draw error bars in both the x and y directions for paired data These are thefunctions errorbars errorbarsby errorbarstab and errorcrosses

        errorbars show the 95 confidence intervals for each variable in a data frame or ma-trix These errors are based upon normal theory and the standard errors of the meanAlternative options include +- one standard deviation or 1 standard error If thedata are repeated measures the error bars will be reflect the between variable cor-relations By default the confidence intervals are displayed using a ldquocats eyesrdquo plotwhich emphasizes the distribution of confidence within the confidence interval

        errorbarsby does the same but grouping the data by some condition

        errorbarstab draws bar graphs from tabular data with error bars based upon thestandard error of proportion (σp =

        radicpqN)

        errorcrosses draw the confidence intervals for an x set and a y set of the same size

        The use of the errorbarsby function allows for graphic comparisons of different groups(see Figure 6) Five personality measures are shown as a function of high versus low scoreson a ldquolierdquo scale People with higher lie scores tend to report being more agreeable consci-entious and less neurotic than people with lower lie scores The error bars are based uponnormal theory and thus are symmetric rather than reflect any skewing in the data

        Although not recommended it is possible to use the errorbars function to draw bargraphs with associated error bars (This kind of dynamite plot (Figure 8) can be verymisleading in that the scale is arbitrary Go to a discussion of the problems in presentingdata this way at httpemdbolkerwikidotcomblogdynamite In the example shownnote that the graph starts at 0 although is out of the range This is a function of usingbars which always are assumed to start at zero Consider other ways of showing yourdata

        344 Error bars for tabular data

        However it is sometimes useful to show error bars for tabular data either found by thetable function or just directly input These may be found using the errorbarstab

        function

        19

        gt data(epibfi)

        gt errorbarsby(epibfi[610]epibfi$epilielt4)

        095 confidence limits

        Independent Variable

        Dep

        ende

        nt V

        aria

        ble

        bfagree bfcon bfext bfneur bfopen

        050

        100

        150

        Figure 6 Using the errorbarsby function shows that self reported personality scales onthe Big Five Inventory vary as a function of the Lie scale on the EPI The ldquocats eyesrdquo showthe distribution of the confidence

        20

        gt errorbarsby(satact[56]satact$genderbars=TRUE

        + labels=c(MaleFemale)ylab=SAT scorexlab=)

        Male Female

        095 confidence limits

        SAT

        sco

        re

        200

        300

        400

        500

        600

        700

        800

        200

        300

        400

        500

        600

        700

        800

        Figure 7 A ldquoDynamite plotrdquo of SAT scores as a function of gender is one way of misleadingthe reader By using a bar graph the range of scores is ignored Bar graphs start from 0

        21

        gt T lt- with(satacttable(gendereducation))

        gt rownames(T) lt- c(MF)

        gt errorbarstab(Tway=bothylab=Proportion of Education Levelxlab=Level of Education

        + main=Proportion of sample by education level)

        Proportion of sample by education level

        Level of Education

        Pro

        port

        ion

        of E

        duca

        tion

        Leve

        l

        000

        005

        010

        015

        020

        025

        030

        M 0 M 1 M 2 M 3 M 4 M 5

        000

        005

        010

        015

        020

        025

        030

        Figure 8 The proportion of each education level that is Male or Female By using theway=rdquobothrdquo option the percentages and errors are based upon the grand total Alterna-tively way=rdquocolumnsrdquo finds column wise percentages way=rdquorowsrdquo finds rowwise percent-ages The data can be converted to percentages (as shown) or by total count (raw=TRUE)The function invisibly returns the probabilities and standard errors See the help menu foran example of entering the data as a dataframe

        22

        345 Two dimensional displays of means and errors

        Yet another way to display data for different conditions is to use the errorCrosses func-tion For instance the effect of various movies on both ldquoEnergetic Arousalrdquo and ldquoTenseArousalrdquo can be seen in one graph and compared to the same movie manipulations onldquoPositive Affectrdquo and ldquoNegative Affectrdquo Note how Energetic Arousal is increased by threeof the movie manipulations but that Positive Affect increases following the Happy movieonly

        23

        gt op lt- par(mfrow=c(12))

        gt data(affect)

        gt colors lt- c(blackredwhiteblue)

        gt films lt- c(SadHorrorNeutralHappy)

        gt affectstats lt- errorCircles(EA2TA2data=affect[-c(120)]group=Filmlabels=films

        + xlab=Energetic Arousal ylab=Tense Arousalylim=c(1022)xlim=c(820)pch=16

        + cex=2colors=colors main = Movies effect on arousal)gt errorCircles(PA2NA2data=affectstatslabels=filmsxlab=Positive Affect

        + ylab=Negative Affect pch=16cex=2colors=colors main =Movies effect on affect)

        gt op lt- par(mfrow=c(11))

        8 12 16 20

        1012

        1416

        1820

        22

        Movies effect on arousal

        Energetic Arousal

        Tens

        e A

        rous

        al

        SadHorror

        NeutralHappy

        6 8 10 12

        24

        68

        10

        Movies effect on affect

        Positive Affect

        Neg

        ativ

        e A

        ffect

        Sad

        Horror

        NeutralHappy

        Figure 9 The use of the errorCircles function allows for two dimensional displays ofmeans and error bars The first call to errorCircles finds descriptive statistics for theaffect dataframe based upon the grouping variable of Film These data are returned andthen used by the second call which examines the effect of the same grouping variable upondifferent measures The size of the circles represent the relative sample sizes for each groupThe data are from the PMC lab and reported in Smillie et al (2012)

        24

        346 Back to back histograms

        The bibars function summarize the characteristics of two groups (eg males and females)on a second variable (eg age) by drawing back to back histograms (see Figure 10)

        25

        data(bfi)gt png( bibarspng )

        gt with(bfibibars(agegenderylab=Agemain=Age by males and females))

        gt devoff()

        null device

        1

        Figure 10 A bar plot of the age distribution for males and females shows the use ofbibars The data are males and females from 2800 cases collected using the SAPAprocedure and are available as part of the bfi data set

        26

        347 Correlational structure

        There are many ways to display correlations Tabular displays are probably the mostcommon The output from the cor function in core R is a rectangular matrix lowerMat

        will round this to (2) digits and then display as a lower off diagonal matrix lowerCor

        calls cor with use=lsquopairwisersquo method=lsquopearsonrsquo as default values and returns (invisibly)the full correlation matrix and displays the lower off diagonal matrix

        gt lowerCor(satact)

        gendr edctn age ACT SATV SATQ

        gender 100

        education 009 100

        age -002 055 100

        ACT -004 015 011 100

        SATV -002 005 -004 056 100

        SATQ -017 003 -003 059 064 100

        When comparing results from two different groups it is convenient to display them as onematrix with the results from one group below the diagonal and the other group above thediagonal Use lowerUpper to do this

        gt female lt- subset(satactsatact$gender==2)

        gt male lt- subset(satactsatact$gender==1)

        gt lower lt- lowerCor(male[-1])

        edctn age ACT SATV SATQ

        education 100

        age 061 100

        ACT 016 015 100

        SATV 002 -006 061 100

        SATQ 008 004 060 068 100

        gt upper lt- lowerCor(female[-1])

        edctn age ACT SATV SATQ

        education 100

        age 052 100

        ACT 016 008 100

        SATV 007 -003 053 100

        SATQ 003 -009 058 063 100

        gt both lt- lowerUpper(lowerupper)

        gt round(both2)

        education age ACT SATV SATQ

        education NA 052 016 007 003

        age 061 NA 008 -003 -009

        ACT 016 015 NA 053 058

        SATV 002 -006 061 NA 063

        SATQ 008 004 060 068 NA

        It is also possible to compare two matrices by taking their differences and displaying one (be-low the diagonal) and the difference of the second from the first above the diagonal

        27

        gt diffs lt- lowerUpper(lowerupperdiff=TRUE)

        gt round(diffs2)

        education age ACT SATV SATQ

        education NA 009 000 -005 005

        age 061 NA 007 -003 013

        ACT 016 015 NA 008 002

        SATV 002 -006 061 NA 005

        SATQ 008 004 060 068 NA

        348 Heatmap displays of correlational structure

        Perhaps a better way to see the structure in a correlation matrix is to display a heat mapof the correlations This is just a matrix color coded to represent the magnitude of thecorrelation This is useful when considering the number of factors in a data set Considerthe Thurstone data set which has a clear 3 factor solution (Figure 11) or a simulated dataset of 24 variables with a circumplex structure (Figure 12) The color coding representsa ldquoheat maprdquo of the correlations with darker shades of red representing stronger negativeand darker shades of blue stronger positive correlations As an option the value of thecorrelation can be shown

        Yet another way to show structure is to use ldquospiderrdquo plots Particularly if variables areordered in some meaningful way (eg in a circumplex) a spider plot will show this structureeasily This is just a plot of the magnitude of the correlation as a radial line with lengthranging from 0 (for a correlation of -1) to 1 (for a correlation of 1) (See Figure 13)

        35 Testing correlations

        Correlations are wonderful descriptive statistics of the data but some people like to testwhether these correlations differ from zero or differ from each other The cortest func-tion (in the stats package) will test the significance of a single correlation and the rcorr

        function in the Hmisc package will do this for many correlations In the psych packagethe corrtest function reports the correlation (Pearson Spearman or Kendall) betweenall variables in either one or two data frames or matrices as well as the number of obser-vations for each case and the (two-tailed) probability for each correlation Unfortunatelythese probability values have not been corrected for multiple comparisons and so shouldbe taken with a great deal of salt Thus in corrtest and corrp the raw probabilitiesare reported below the diagonal and the probabilities adjusted for multiple comparisonsusing (by default) the Holm correction are reported above the diagonal (Table 1) (See thepadjust function for a discussion of Holm (1979) and other corrections)

        Testing the difference between any two correlations can be done using the rtest functionThe function actually does four different tests (based upon an article by Steiger (1980)

        28

        gt png(corplotpng)gt corPlot(Thurstonenumbers=TRUEupper=FALSEdiag=FALSEmain=9 cognitive variables from Thurstone)

        gt devoff()

        null device

        1

        Figure 11 The structure of correlation matrix can be seen more clearly if the variables aregrouped by factor and then the correlations are shown by color By using the rsquonumbersrsquooption the values are displayed as well By default the complete matrix is shown Settingupper=FALSE and diag=FALSE shows a cleaner figure

        29

        gt png(circplotpng)gt circ lt- simcirc(24)

        gt rcirc lt- cor(circ)

        gt corPlot(rcircmain=24 variables in a circumplex)gt devoff()

        null device

        1

        Figure 12 Using the corPlot function to show the correlations in a circumplex Correlationsare highest near the diagonal diminish to zero further from the diagonal and the increaseagain towards the corners of the matrix Circumplex structures are common in the studyof affect For circumplex structures it is perhaps useful to show the complete matrix

        30

        gt png(spiderpng)gt oplt- par(mfrow=c(22))

        gt spider(y=c(161218)x=124data=rcircfill=TRUEmain=Spider plot of 24 circumplex variables)

        gt op lt- par(mfrow=c(11))

        gt devoff()

        null device

        1

        Figure 13 A spider plot can show circumplex structure very clearly Circumplex structuresare common in the study of affect

        31

        Table 1 The corrtest function reports correlations cell sizes and raw and adjustedprobability values corrp reports the probability values for a correlation matrix Bydefault the adjustment used is that of Holm (1979)gt corrtest(satact)

        Callcorrtest(x = satact)

        Correlation matrix

        gender education age ACT SATV SATQ

        gender 100 009 -002 -004 -002 -017

        education 009 100 055 015 005 003

        age -002 055 100 011 -004 -003

        ACT -004 015 011 100 056 059

        SATV -002 005 -004 056 100 064

        SATQ -017 003 -003 059 064 100

        Sample Size

        gender education age ACT SATV SATQ

        gender 700 700 700 700 700 687

        education 700 700 700 700 700 687

        age 700 700 700 700 700 687

        ACT 700 700 700 700 700 687

        SATV 700 700 700 700 700 687

        SATQ 687 687 687 687 687 687

        Probability values (Entries above the diagonal are adjusted for multiple tests)

        gender education age ACT SATV SATQ

        gender 000 017 100 100 1 0

        education 002 000 000 000 1 1

        age 058 000 000 003 1 1

        ACT 033 000 000 000 0 0

        SATV 062 022 026 000 0 0

        SATQ 000 036 037 000 0 0

        To see confidence intervals of the correlations print with the short=FALSE option

        32

        depending upon the input

        1) For a sample size n find the t and p value for a single correlation as well as the confidenceinterval

        gt rtest(503)

        Correlation tests

        Callrtest(n = 50 r12 = 03)

        Test of significance of a correlation

        t value 218 with probability lt 0034

        and confidence interval 002 053

        2) For sample sizes of n and n2 (n2 = n if not specified) find the z of the difference betweenthe z transformed correlations divided by the standard error of the difference of two zscores

        gt rtest(3046)

        Correlation tests

        Callrtest(n = 30 r12 = 04 r34 = 06)

        Test of difference between two independent correlations

        z value 099 with probability 032

        3) For sample size n and correlations ra= r12 rb= r23 and r13 specified test for thedifference of two dependent correlations (Steiger case A)

        gt rtest(103451)

        Correlation tests

        Call[1] rtest(n = 103 r12 = 04 r23 = 01 r13 = 05 )

        Test of difference between two correlated correlations

        t value -089 with probability lt 037

        4) For sample size n test for the difference between two dependent correlations involvingdifferent variables (Steiger case B)

        gt rtest(103567558) steiger Case B

        Correlation tests

        Callrtest(n = 103 r12 = 05 r34 = 06 r23 = 07 r13 = 05 r14 = 05

        r24 = 08)

        Test of difference between two dependent correlations

        z value -12 with probability 023

        To test whether a matrix of correlations differs from what would be expected if the popu-lation correlations were all zero the function cortest follows Steiger (1980) who pointedout that the sum of the squared elements of a correlation matrix or the Fisher z scoreequivalents is distributed as chi square under the null hypothesis that the values are zero(ie elements of the identity matrix) This is particularly useful for examining whethercorrelations in a single matrix differ from zero or for comparing two matrices Althoughobvious cortest can be used to test whether the satact data matrix produces non-zerocorrelations (it does) This is a much more appropriate test when testing whether a residualmatrix differs from zero

        gt cortest(satact)

        33

        Tests of correlation matrices

        Callcortest(R1 = satact)

        Chi Square value 132542 with df = 15 with probability lt 18e-273

        36 Polychoric tetrachoric polyserial and biserial correlations

        The Pearson correlation of dichotomous data is also known as the φ coefficient If thedata eg ability items are thought to represent an underlying continuous although latentvariable the φ will underestimate the value of the Pearson applied to these latent variablesOne solution to this problem is to use the tetrachoric correlation which is based uponthe assumption of a bivariate normal distribution that has been cut at certain points Thedrawtetra function demonstrates the process (Figure 14) This is also shown in termsof dichotomizing the bivariate normal density function using the drawcor function (Fig-ure 15) A simple generalization of this to the case of the multiple cuts is the polychoric

        correlation

        Other estimated correlations based upon the assumption of bivariate normality with cutpoints include the biserial and polyserial correlation

        If the data are a mix of continuous polytomous and dichotomous variables the mixedcor

        function will calculate the appropriate mixture of Pearson polychoric tetrachoric biserialand polyserial correlations

        The correlation matrix resulting from a number of tetrachoric or polychoric correlationmatrix sometimes will not be positive semi-definite This will sometimes happen if thecorrelation matrix is formed by using pair-wise deletion of cases The corsmooth functionwill adjust the smallest eigen values of the correlation matrix to make them positive rescaleall of them to sum to the number of variables and produce aldquosmoothedrdquocorrelation matrixAn example of this problem is a data set of burt which probably had a typo in the originalcorrelation matrix Smoothing the matrix corrects this problem

        4 Multilevel modeling

        Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular use

        34

        gt drawtetra()

        minus3 minus2 minus1 0 1 2 3

        minus3

        minus2

        minus1

        01

        23

        Y rho = 05phi = 033

        X gt τY gt Τ

        X lt τY gt Τ

        X gt τY lt Τ

        X lt τY lt Τ

        x

        dnor

        m(x

        )

        X gt τ

        τ

        x1

        Y gt Τ

        Τ

        Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values

        35

        gt drawcor(expand=20cuts=c(00))

        xy

        z

        Bivariate density rho = 05

        Figure 15 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values It isfound (laboriously) by optimizing the fit of the bivariate normal for various values of thecorrelation to the observed cell frequencies

        36

        is the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups

        41 Decomposing data into within and between level correlations usingstatsBy

        There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler functionto give some of the basic descriptive statistics for two level models

        This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package

        rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)

        where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means

        42 Generating and displaying multilevel data

        withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a betweengroup correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data

        simmultilevel will generate simulated data with a multilevel structure

        The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary

        function specifying the variable of interest

        37

        Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )

        43 Factor analysis by groups

        Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output

        sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)

        faBy(sbnfactors=5) find the 5 factor solution for each education level

        5 Multiple Regression mediation moderation and set cor-relations

        The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices

        51 Multiple regression from data or correlation matrices

        The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstonecorrelation matrix and find the multiple correlation of the last five variables as a functionof the first 4

        gt setCor(y = 59x=14data=Thurstone)

        Call setCor(y = 59 x = 14 data = Thurstone)

        Multiple Regression from matrix input

        Beta weights

        FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

        Sentences 009 007 025 021 020

        Vocabulary 009 017 009 016 -002

        SentCompletion 002 005 004 021 008

        FirstLetters 058 045 021 008 031

        38

        Multiple R

        FourLetterWords Suffixes LetterSeries Pedigrees

        069 063 050 058

        LetterGroup

        048

        multiple R2

        FourLetterWords Suffixes LetterSeries Pedigrees

        048 040 025 034

        LetterGroup

        023

        Multiple Inflation Factor (VIF) = 1(1-SMC) =

        Sentences Vocabulary SentCompletion FirstLetters

        369 388 300 135

        Unweighted multiple R

        FourLetterWords Suffixes LetterSeries Pedigrees

        059 058 049 058

        LetterGroup

        045

        Unweighted multiple R2

        FourLetterWords Suffixes LetterSeries Pedigrees

        034 034 024 033

        LetterGroup

        020

        Various estimates of between set correlations

        Squared Canonical Correlations

        [1] 06280 01478 00076 00049

        Average squared canonical correlation = 02

        Cohens Set Correlation R2 = 069

        Unweighted correlation between the two sets = 073

        By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed

        gt sc lt- setCor(y = 59x=34data=Thurstonez=12)

        Call setCor(y = 59 x = 34 data = Thurstone z = 12)

        Multiple Regression from matrix input

        Beta weights

        FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

        SentCompletion 002 005 004 021 008

        FirstLetters 058 045 021 008 031

        Multiple R

        FourLetterWords Suffixes LetterSeries Pedigrees

        058 046 021 018

        LetterGroup

        030

        39

        multiple R2

        FourLetterWords Suffixes LetterSeries Pedigrees

        0331 0210 0043 0032

        LetterGroup

        0092

        Multiple Inflation Factor (VIF) = 1(1-SMC) =

        SentCompletion FirstLetters

        102 102

        Unweighted multiple R

        FourLetterWords Suffixes LetterSeries Pedigrees

        044 035 017 014

        LetterGroup

        026

        Unweighted multiple R2

        FourLetterWords Suffixes LetterSeries Pedigrees

        019 012 003 002

        LetterGroup

        007

        Various estimates of between set correlations

        Squared Canonical Correlations

        [1] 0405 0023

        Average squared canonical correlation = 021

        Cohens Set Correlation R2 = 042

        Unweighted correlation between the two sets = 048

        gt round(sc$residual2)

        FourLetterWords Suffixes LetterSeries Pedigrees

        FourLetterWords 052 011 009 006

        Suffixes 011 060 -001 001

        LetterSeries 009 -001 075 028

        Pedigrees 006 001 028 066

        LetterGroup 013 003 037 020

        LetterGroup

        FourLetterWords 013

        Suffixes 003

        LetterSeries 037

        Pedigrees 020

        LetterGroup 077

        52 Mediation and Moderation analysis

        Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping

        40

        Consider the example from Preacher and Hayes (2004) as analyzed using the mediate

        function and the subsequent graphic from mediatediagram The data are found in theexample for mediate

        Call mediate(y = SATIS x = THERAPY m = ATTRIB data = sobel)

        The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB

        Total Direct effect(c) of THERAPY on SATIS = 076 SE = 031 t direct = 25 with probability = 0019

        Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t direct = 135 with probability = 019

        Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033

        Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 004 Upper CI = 069

        R2 of model = 031

        To see the longer output specify short = FALSE in the print statement

        Full output

        Total effect estimates (c)

        SATIS se t Prob

        THERAPY 076 031 25 00186

        Direct effect estimates (c)SATIS se t Prob

        THERAPY 043 032 135 0190

        ATTRIB 040 018 223 0034

        a effect estimates

        THERAPY se t Prob

        ATTRIB 082 03 274 00106

        b effect estimates

        SATIS se t Prob

        ATTRIB 04 018 223 0034

        ab effect estimates

        SATIS boot sd lower upper

        THERAPY 033 032 017 004 069

        bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables

        setCor(y = c( SATV SATQ) x = c(education age ) data = satact std=TRUE)

        bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap

        mediate(y = c( SATV ) x = c(education age ) m= ACT data =satactstd=TRUEniter=50)

        bull mediate will take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It then tests the mediation effect using a bootstrap The particular example is for demonstration purposes only and shows neithermoderation nor mediation The number of iterations for the boot strap was set to 50

        41

        gt mediatediagram(preacher)

        Mediation model

        THERAPY SATIS

        ATTRIB

        082

        c = 076

        c = 043

        04

        Figure 16 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram

        42

        gt preacher lt- setCor(1c(23)sobelstd=FALSE)

        gt setCordiagram(preacher)

        Regression Models

        THERAPY

        ATTRIB

        SATIS

        043

        04

        021

        Figure 17 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure

        43

        for speed The default number of boot straps is 5000

        53 Set Correlation

        An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor

        function Set correlation is

        R2 = 1minusn

        prodi=1

        (1minusλi)

        where λi is the ith eigen value of the eigen value decomposition of the matrix

        R = Rminus1xx RxyRminus1

        xx Rminus1xy

        Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate

        setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data

        Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things tonotice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied

        For this example the analysis is done on the correlation matrix rather than the rawdata

        gt C lt- cov(satactuse=pairwise)

        gt model1 lt- lm(ACT~ gender + education + age data=satact)

        gt summary(model1)

        Call

        lm(formula = ACT ~ gender + education + age data = satact)

        Residuals

        44

        Call mediate(y = c(SATQ) x = c(ACT) m = education data = satact

        mod = gender niter = 50 std = TRUE)

        The DV (Y) was SATQ The IV (X) was ACT gender ACTXgndr The mediating variable(s) = education

        Total Direct effect(c) of ACT on SATQ = 058 SE = 003 t direct = 1925 with probability = 0

        Direct effect (c) of ACT on SATQ removing education = 059 SE = 003 t direct = 1926 with probability = 0

        Indirect effect (ab) of ACT on SATQ through education = -001

        Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -002 Upper CI = 0

        Total Direct effect(c) of gender on SATQ = -014 SE = 003 t direct = -478 with probability = 21e-06

        Direct effect (c) of gender on NA removing education = -014 SE = 003 t direct = -463 with probability = 44e-06

        Indirect effect (ab) of gender on SATQ through education = 0

        Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -001 Upper CI = 0

        Total Direct effect(c) of ACTXgndr on SATQ = 0 SE = 003 t direct = 002 with probability = 099

        Direct effect (c) of ACTXgndr on NA removing education = 0 SE = 003 t direct = 001 with probability = 099

        Indirect effect (ab) of ACTXgndr on SATQ through education = 0

        Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = 0 Upper CI = 0

        R2 of model = 037

        To see the longer output specify short = FALSE in the print statement

        Full output

        Total effect estimates (c)

        SATQ se t Prob

        ACT 058 003 1925 000e+00

        gender -014 003 -478 210e-06

        ACTXgndr 000 003 002 985e-01

        Direct effect estimates (c)SATQ se t Prob

        ACT 059 003 1926 000e+00

        gender -014 003 -463 437e-06

        ACTXgndr 000 003 001 992e-01

        a effect estimates

        education se t Prob

        ACT 016 004 422 277e-05

        gender 009 004 250 128e-02

        ACTXgndr -001 004 -015 883e-01

        b effect estimates

        SATQ se t Prob

        education -004 003 -145 0147

        ab effect estimates

        SATQ boot sd lower upper

        ACT -001 -001 001 0 0

        gender 000 000 000 0 0

        ACTXgndr 000 000 000 0 0

        Moderation model

        ACT

        gender

        ACTXgndr

        SATQ

        education016 c = 058

        c = 059

        009 c = minus014

        c = minus014

        minus001 c = 0

        c = 0

        minus004

        minus004

        minus007

        002

        Figure 18 Moderated multiple regression requires the raw data

        45

        Min 1Q Median 3Q Max

        -252458 -32133 07769 35921 92630

        Coefficients

        Estimate Std Error t value Pr(gt|t|)

        (Intercept) 2741706 082140 33378 lt 2e-16

        gender -048606 037984 -1280 020110

        education 047890 015235 3143 000174

        age 001623 002278 0712 047650

        ---

        Signif codes 0 0001 001 005 01 1

        Residual standard error 4768 on 696 degrees of freedom

        Multiple R-squared 00272 Adjusted R-squared 002301

        F-statistic 6487 on 3 and 696 DF p-value 00002476

        Compare this with the output from setCor

        gt compare with sector

        gt setCor(c(46)c(13)C nobs=700)

        Call setCor(y = c(46) x = c(13) data = C nobs = 700)

        Multiple Regression from matrix input

        Beta weights

        ACT SATV SATQ

        gender -005 -003 -018

        education 014 010 010

        age 003 -010 -009

        Multiple R

        ACT SATV SATQ

        016 010 019

        multiple R2

        ACT SATV SATQ

        00272 00096 00359

        Multiple Inflation Factor (VIF) = 1(1-SMC) =

        gender education age

        101 145 144

        Unweighted multiple R

        ACT SATV SATQ

        015 005 011

        Unweighted multiple R2

        ACT SATV SATQ

        002 000 001

        SE of Beta weights

        ACT SATV SATQ

        gender 018 429 434

        education 022 513 518

        age 022 511 516

        t of Beta Weights

        ACT SATV SATQ

        gender -027 -001 -004

        education 065 002 002

        46

        age 015 -002 -002

        Probability of t lt

        ACT SATV SATQ

        gender 079 099 097

        education 051 098 098

        age 088 098 099

        Shrunken R2

        ACT SATV SATQ

        00230 00054 00317

        Standard Error of R2

        ACT SATV SATQ

        00120 00073 00137

        F

        ACT SATV SATQ

        649 226 863

        Probability of F lt

        ACT SATV SATQ

        248e-04 808e-02 124e-05

        degrees of freedom of regression

        [1] 3 696

        Various estimates of between set correlations

        Squared Canonical Correlations

        [1] 0050 0033 0008

        Chisq of canonical correlations

        [1] 358 231 56

        Average squared canonical correlation = 003

        Cohens Set Correlation R2 = 009

        Shrunken Set Correlation R2 = 008

        F and df of Cohens Set Correlation 726 9 168186

        Unweighted correlation between the two sets = 001

        Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

        6 Converting output to APA style tables using LATEX

        Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenient

        47

        LATEXoutput and finally df2latex converts a generic data frame to LATEX

        An example of converting the output from fa to LATEXappears in Table 2

        Table 2 fa2latexA factor analysis table from the psych package in R

        Variable MR1 MR2 MR3 h2 u2 com

        Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

        SS loadings 264 186 15

        MR1 100 059 054MR2 059 100 052MR3 054 052 100

        48

        7 Miscellaneous functions

        A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

        blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

        df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

        scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

        cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

        cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

        dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

        fisherz Convert a correlation to the corresponding Fisher z score

        geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

        ICC and cohenkappa are typically used to find the reliability for raters

        headtail combines the head and tail functions to show the first and last lines of a dataset or output

        topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

        mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

        prep finds the probability of replication for an F t or r and estimate effect size

        partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

        rangeCorrection will correct correlations for restriction of range

        reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

        49

        superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

        8 Data sets

        A number of data sets for demonstrating psychometric techniques are included in thepsych package These include six data sets showing a hierarchical factor structure (fivecognitive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2and one from health psychology Reise) One of these (Thurstone) is used as an examplein the sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) or 13 personality inventory scores (epibfi) and14 multiple choice iq items (iqitems) The vegetables example has paired comparisonpreferences for 9 vegetables This is an example of Thurstonian scaling used by Guilford(1954) and Nunnally (1967) Other data sets include cubits peas and heights fromGalton

        Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

        bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

        satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

        epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

        50

        iq 14 multiple choice ability items were included as part of the Synthetic Aperture Person-ality Assessment (SAPA) web based personality assessment project The data from1000 subjects are included here as a demonstration set for scoring multiple choiceinventories and doing basic item statistics

        galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to ntroduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

        Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

        miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

        9 Development version and a users guide

        The most recent development version is available as a source file at the repository main-tained at httppersonality-projectorgr That version will have removed the mostrecently discovered bugs (but perhaps introduced other yet to be discovered ones) Todownload that version go to the repository httppersonality-projectorgrsrc

        contrib and wander around For a Mac this version can be installed directly using theldquoother repositoryrdquo option in the package installer For a PC the zip file for the most recentrelease has been created using the win-builder facility at CRAN The development releasefor the Mac is usually several weeks ahead of the PC development version

        Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httppersonality-projectorgrpsych_manualpdf

        News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

        gt news(Version gt 170package=psych)

        51

        10 Psychometric Theory

        The psych package has been developed to help psychologists do basic research Many ofthe functions were developed to supplement a book (httppersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep)More information about the use of some of the functions may be found in the book

        For more extensive discussion of the use of psych in particular and R in general consulthttppersonality-projectorgrrguidehtml A short guide to R

        11 SessionInfo

        This document was prepared using the following settings

        gt sessionInfo()

        R Under development (unstable) (2017-03-05 r72309)

        Platform x86_64-apple-darwin1340 (64-bit)

        Running under macOS Sierra 10124

        Matrix products default

        BLAS LibraryFrameworksRframeworkVersions34ResourcesliblibRblas0dylib

        LAPACK LibraryFrameworksRframeworkVersions34ResourcesliblibRlapackdylib

        locale

        [1] C

        attached base packages

        [1] stats graphics grDevices utils datasets methods base

        other attached packages

        [1] psych_17421

        loaded via a namespace (and not attached)

        [1] compiler_340 parallel_340 tools_340 foreign_08-67

        [5] KernSmooth_223-15 nlme_31-131 mnormt_15-4 grid_340

        [9] lattice_020-34

        52

        References

        Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

        Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

        Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

        Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

        Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

        Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

        Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

        Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

        Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

        Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

        Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

        Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

        Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

        Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

        Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

        53

        Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

        Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

        Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

        Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

        Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

        Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

        Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

        Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

        Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

        Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

        MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

        Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

        McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

        Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

        Nunnally J C (1967) Psychometric theory McGraw-Hill New York

        54

        Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

        3rd edition

        Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

        Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

        Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

        Revelle W (2015) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 158

        Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

        Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

        Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

        Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

        Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook ofIndividual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

        Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

        Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

        Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

        Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bang

        55

        for the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

        Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

        Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

        Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

        Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

        Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

        Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

        Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

        Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

        Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

        Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

        Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

        56

        Index

        affect 14 24alpha 5 6

        Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26bifactor 6biserial 13 34blockrandom 49burt 34

        char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49circular statistics 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49ctv 7cubits 50

        densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49diagram 8drawcor 34drawtetra 34dummycode 13

        dynamite plot 19

        edit 3epibfi 50error bars 19errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23

        fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factor analysis 6factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49

        galton 51generalized least squares 6geometricmean 49GPArotation 7guttman 6

        harmonicmean 49head 49headtail 49heights 50hetdiagram 7Hmisc 28Holzinger 50

        57

        ICC 6 49iclust 6iclustdiagram 7Index 49introduction to psychometric theory with ap-

        plications in R 7iqitems 50irtfa 6 47irt2latex 47 49

        KnitR 47

        lavaan 38library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27lowess 14

        makekeys 14MAP 6mardia 49maximum likelihood 6mediate 4 41 42mediatediagram 41minimum residual 6mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevel 37multilevelreliability 6multiple regression 38

        nfactors 6nlme 37

        omega 6 7outlier 3 11 12

        padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7principal axis 6psych 3 5ndash8 28 47 49ndash52

        R functionaffect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50

        58

        densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13edit 3epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49head 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47

        irt2latex 47 49library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7psych 51psych package

        affect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26

        59

        biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49corsmooth 34cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7

        fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47irt2latex 47 49lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12prep 49pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34

        60

        polyserial 34principal 5ndash7psych 51rtest 28rangeCorrection 49readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

        rtest 28

        rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50table 19tail 49tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

        R package

        61

        ctv 7GPArotation 7Hmisc 28KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash8 28 47 49ndash52Rgraphviz 8sem 7 50stats 28Sweave 47xtable 47

        rtest 28rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49Rgraphviz 8

        SAPA 26 50 51satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43sem 7 50set correlation 44setcor 49setCor 4 38 41 44 46 47simmultilevel 37

        spider 13stars 14stats 28StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50Sweave 47

        table 19tail 49tetrachoric 6 34Thurstone 28 38 50Thurstone33 50topBottom 49

        vegetables 50 51violinBy 14 18vss 5 6

        weighted least squares 6withinBetween 37

        xtable 47

        62

        • Jump starting the psych packagendasha guide for the impatient
        • Psychometric functions are summarized in the second vignette
        • Overview of this and related documents
        • Getting started
        • Basic data analysis
          • Getting the data by using readfile
          • Data input from the clipboard
          • Basic descriptive statistics
            • Outlier detection using outlier
            • Basic data cleaning using scrub
            • Recoding categorical variables into dummy coded variables
              • Simple descriptive graphics
                • Scatter Plot Matrices
                • Density or violin plots
                • Means and error bars
                • Error bars for tabular data
                • Two dimensional displays of means and errors
                • Back to back histograms
                • Correlational structure
                • Heatmap displays of correlational structure
                  • Testing correlations
                  • Polychoric tetrachoric polyserial and biserial correlations
                    • Multilevel modeling
                      • Decomposing data into within and between level correlations using statsBy
                      • Generating and displaying multilevel data
                      • Factor analysis by groups
                        • Multiple Regression mediation moderation and set correlations
                          • Multiple regression from data or correlation matrices
                          • Mediation and Moderation analysis
                          • Set Correlation
                            • Converting output to APA style tables using LaTeX
                            • Miscellaneous functions
                            • Data sets
                            • Development version and a users guide
                            • Psychometric Theory
                            • SessionInfo

          bull Test for the number of factors in your data using parallel analysis (faparallelsection ) or Very Simple Structure (vss )

          faparallel(myData)

          vss(myData)

          bull Factor analyze (see section ) the data with a specified number of factors(the default is 1) the default method is minimum residual the default rotationfor more than one factor is oblimin There are many more possibilities (seesections -) Compare the solution to a hierarchical cluster analysis using theICLUST algorithm (Revelle 1979) (see section ) Also consider a hierarchicalfactor solution to find coefficient ω (see )

          fa(myData)

          iclust(myData)

          omega(myData)

          If you prefer to do a principal components analysis you may use the principalfunction The default is one component

          principal(myData)

          bull Some people like to find coefficient α as an estimate of reliability This may bedone for a single scale using the alpha function (see ) Perhaps more usefulis the ability to create several scales as unweighted averages of specified itemsusing the scoreItems function (see ) and to find various estimates of internalconsistency for these scales find their intercorrelations and find scores for allthe subjects

          alpha(myData) score all of the items as part of one scale

          myKeys lt- makekeys(nvar=20list(first = c(1-35-7810)second=c(24-61115-16)))

          myscores lt- scoreItems(myKeysmyData) form several scales

          myscores show the highlights of the results

          At this point you have had a chance to see the highlights of the psych package and to dosome basic (and advanced) data analysis You might find reading this entire vignette aswell as the Overview Vignette to be helpful to get a broader understanding of what can bedone in R using the psych Remember that the help command () is available for everyfunction Try running the examples for each help page

          5

          1 Overview of this and related documents

          The psych package (Revelle 2015) has been developed at Northwestern University since2005 to include functions most useful for personality psychometric and psychological re-search The package is also meant to supplement a text on psychometric theory (Revelleprep) a draft of which is available at httppersonality-projectorgrbook

          Some of the functions (eg readfile readclipboard describe pairspanels scat-terhist errorbars multihist bibars) are useful for basic data entry and descrip-tive analyses

          Psychometric applications emphasize techniques for dimension reduction including factoranalysis cluster analysis and principal components analysis The fa function includesfive methods of factor analysis (minimum residual principal axis weighted least squaresgeneralized least squares and maximum likelihood factor analysis) Principal ComponentsAnalysis (PCA) is also available through the use of the principal or pca functions De-termining the number of factors or components to extract may be done by using the VerySimple Structure (Revelle and Rocklin 1979) (vss) Minimum Average Partial correlation(Velicer 1976) (MAP) or parallel analysis (faparallel) criteria These and several othercriteria are included in the nfactors function Two parameter Item Response Theory(IRT) models for dichotomous or polytomous items may be found by factoring tetra-

          choric or polychoric correlation matrices and expressing the resulting parameters interms of location and discrimination using irtfa

          Bifactor and hierarchical factor structures may be estimated by using Schmid Leimantransformations (Schmid and Leiman 1957) (schmid) to transform a hierarchical factorstructure into a bifactor solution (Holzinger and Swineford 1937) Higher order modelscan also be found using famulti

          Scale construction can be done using the Item Cluster Analysis (Revelle 1979) (iclust)function to determine the structure and to calculate reliability coefficients α (Cronbach1951)(alpha scoreItems scoremultiplechoice) β (Revelle 1979 Revelle and Zin-barg 2009) (iclust) and McDonaldrsquos ωh and ωt (McDonald 1999) (omega) Guttmanrsquos sixestimates of internal consistency reliability (Guttman (1945) as well as additional estimates(Revelle and Zinbarg 2009) are in the guttman function The six measures of Intraclasscorrelation coefficients (ICC) discussed by Shrout and Fleiss (1979) are also available

          For data with a a multilevel structure (eg items within subjects across time or itemswithin subjects across groups) the describeBy statsBy functions will give basic descrip-tives by group StatsBy also will find within group (or subject) correlations as well as thebetween group correlation

          multilevelreliability mlr will find various generalizability statistics for subjects over

          6

          time and items mlPlot will graph items over for each subject mlArrange converts widedata frames to long data frames suitable for multilevel modeling

          Graphical displays include Scatter Plot Matrix (SPLOM) plots using pairspanels cor-relation ldquoheat mapsrdquo (corPlot) factor cluster and structural diagrams using fadiagramiclustdiagram structurediagram and hetdiagram as well as item response charac-teristics and item and test information characteristic curves plotirt and plotpoly

          This vignette is meant to give an overview of the psych package That is it is meantto give a summary of the main functions in the psych package with examples of howthey are used for data description dimension reduction and scale construction The ex-tended user manual at psych_manualpdf includes examples of graphic output and moreextensive demonstrations than are found in the help menus (Also available at http

          personality-projectorgrpsych_manualpdf) The vignette psych for sem atpsych_for_sempdf discusses how to use psych as a front end to the sem package of JohnFox (Fox et al 2012) (The vignette is also available at httppersonality-project

          orgrbookpsych_for_sempdf)

          For a step by step tutorial in the use of the psych package and the base functions inR for basic personality research see the guide for using R for personality research athttppersonalitytheoryorgrrshorthtml For an introduction to psychometrictheory with applications in R see the draft chapters at httppersonality-project

          orgrbook)

          2 Getting started

          Some of the functions described in the Overview Vignette require other packages This isnot the case for the functions listed in this Introduction Particularly useful for rotatingthe results of factor analyses (from eg fa factorminres factorpa factorwlsor principal) or hierarchical factor models using omega or schmid is the GPArotationpackage These and other useful packages may be installed by first installing and thenusing the task views (ctv) package to install the ldquoPsychometricsrdquo task view but doing itthis way is not necessary

          installpackages(ctv)

          library(ctv)

          taskviews(Psychometrics)

          The ldquoPsychometricsrdquo task view will install a large number of useful packages To installthe bare minimum for the examples in this vignette it is necessary to install just 3 pack-ages

          7

          installpackages(list(c(GPArotationmnormt)

          Because of the difficulty of installing the package Rgraphviz alternative graphics have beendeveloped and are available as diagram functions If Rgraphviz is available some functionswill take advantage of it An alternative is to useldquodotrdquooutput of commands for any externalgraphics package that uses the dot language

          3 Basic data analysis

          A number of psych functions facilitate the entry of data and finding basic descriptivestatistics

          Remember to run any of the psych functions it is necessary to make the package activeby using the library command

          library(psych)

          The other packages once installed will be called automatically by psych

          It is possible to automatically load psych and other functions by creating and then savinga ldquoFirstrdquo function eg

          First lt- function(x) library(psych)

          31 Getting the data by using readfile

          Although many find copying the data to the clipboard and then using the readclipboardfunctions (see below) a helpful alternative is to read the data in directly This can be doneusing the readfile function which calls filechoose to find the file and then based uponthe suffix of the file chooses the appropriate way to read it For files with suffixes of txttext r rds rda csv xpt or sav the file will be read correctly

          mydata lt- readfile()

          If the file contains Fixed Width Format (fwf) data the column information can be specifiedwith the widths command

          mydata lt- readfile(widths = c(4rep(135)) will read in a file without a header row and 36 fields the first of which is 4 colums the rest of which are 1 column each

          If the file is a RData file (with suffix of RData Rda rda Rdata or rdata) the objectwill be loaded Depending what was stored this might be several objects If the file is asav file from SPSS it will be read with the most useful default options (converting the fileto a dataframe and converting character fields to numeric) Alternative options may bespecified If it is an export file from SAS (xpt or XPT) it will be read csv files (comma

          8

          separated files) normal txt or text files data or dat files will be read as well These areassumed to have a header row of variable labels (header=TRUE) If the data do not havea header row you must specify readfile(header=FALSE)

          To read SPSS files and to keep the value labels specify usevaluelabels=TRUE

          myspss lt- readfile(usevaluelabels=TRUE) this will keep the value labels for sav files

          32 Data input from the clipboard

          There are of course many ways to enter data into R Reading from a local file usingreadtable is perhaps the most preferred However many users will enter their datain a text editor or spreadsheet program and then want to copy and paste into R Thismay be done by using readtable and specifying the input file as ldquoclipboardrdquo (PCs) orldquopipe(pbpaste)rdquo (Macs) Alternatively the readclipboard set of functions are perhapsmore user friendly

          readclipboard is the base function for reading data from the clipboard

          readclipboardcsv for reading text that is comma delimited

          readclipboardtab for reading text that is tab delimited (eg copied directly from anExcel file)

          readclipboardlower for reading input of a lower triangular matrix with or without adiagonal The resulting object is a square matrix

          readclipboardupper for reading input of an upper triangular matrix

          readclipboardfwf for reading in fixed width fields (some very old data sets)

          For example given a data set copied to the clipboard from a spreadsheet just enter thecommand

          mydata lt- readclipboard()

          This will work if every data field has a value and even missing data are given some values(eg NA or -999) If the data were entered in a spreadsheet and the missing valueswere just empty cells then the data should be read in as a tab delimited or by using thereadclipboardtab function

          gt mydata lt- readclipboard(sep=t) define the tab option or

          gt mytabdata lt- readclipboardtab() just use the alternative function

          For the case of data in fixed width fields (some old data sets tend to have this format)copy to the clipboard and then specify the width of each field (in the example below the

          9

          first variable is 5 columns the second is 2 columns the next 5 are 1 column the last 4 are3 columns)

          gt mydata lt- readclipboardfwf(widths=c(52rep(15)rep(34))

          33 Basic descriptive statistics

          Once the data are read in then describe or describeBy will provide basic descriptivestatistics arranged in a data frame format Consider the data set satact which in-cludes data from 700 web based participants on 3 demographic variables and 3 abilitymeasures

          describe reports means standard deviations medians min max range skew kurtosisand standard errors for integer or real data Non-numeric data although the statisticsare meaningless will be treated as if numeric (based upon the categorical coding ofthe data) and will be flagged with an

          describeBy reports descriptive statistics broken down by some categorizing variable (eggender age etc)

          gt library(psych)

          gt data(satact)

          gt describe(satact) basic descriptive statistics

          vars n mean sd median trimmed mad min max range skew

          gender 1 700 165 048 2 168 000 1 2 1 -061

          education 2 700 316 143 3 331 148 0 5 5 -068

          age 3 700 2559 950 22 2386 593 13 65 52 164

          ACT 4 700 2855 482 29 2884 445 3 36 33 -066

          SATV 5 700 61223 11290 620 61945 11861 200 800 600 -064

          SATQ 6 687 61022 11564 620 61725 11861 200 800 600 -059

          kurtosis se

          gender -162 002

          education -007 005

          age 242 036

          ACT 053 018

          SATV 033 427

          SATQ -002 441

          These data may then be analyzed by groups defined in a logical statement or by some othervariable Eg break down the descriptive data for males or females These descriptivedata can also be seen graphically using the errorbarsby function (Figure 6) By settingskew=FALSE and ranges=FALSE the output is limited to the most basic statistics

          gt basic descriptive statistics by a grouping variable

          gt describeBy(satactsatact$genderskew=FALSEranges=FALSE)

          Descriptive statistics by group

          group 1

          vars n mean sd se

          gender 1 247 100 000 000

          10

          education 2 247 300 154 010

          age 3 247 2586 974 062

          ACT 4 247 2879 506 032

          SATV 5 247 61511 11416 726

          SATQ 6 245 63587 11602 741

          ------------------------------------------------------------

          group 2

          vars n mean sd se

          gender 1 453 200 000 000

          education 2 453 326 135 006

          age 3 453 2545 937 044

          ACT 4 453 2842 469 022

          SATV 5 453 61066 11231 528

          SATQ 6 442 59600 11307 538

          The output from the describeBy function can be forced into a matrix form for easy analysisby other programs In addition describeBy can group by several grouping variables at thesame time

          gt samat lt- describeBy(satactlist(satact$gendersatact$education)

          + skew=FALSEranges=FALSEmat=TRUE)

          gt headTail(samat)

          item group1 group2 vars n mean sd se

          gender1 1 1 0 1 27 1 0 0

          gender2 2 2 0 1 30 2 0 0

          gender3 3 1 1 1 20 1 0 0

          gender4 4 2 1 1 25 2 0 0

          ltNAgt ltNAgt ltNAgt

          SATQ9 69 1 4 6 51 6359 10412 1458

          SATQ10 70 2 4 6 86 59759 10624 1146

          SATQ11 71 1 5 6 46 65783 8961 1321

          SATQ12 72 2 5 6 93 60672 10555 1095

          331 Outlier detection using outlier

          One way to detect unusual data is to consider how far each data point is from the mul-tivariate centroid of the data That is find the squared Mahalanobis distance for eachdata point and then compare these to the expected values of χ2 This produces a Q-Q(quantle-quantile) plot with the n most extreme data points labeled (Figure 1) The outliervalues are in the vector d2

          332 Basic data cleaning using scrub

          If after describing the data it is apparent that there were data entry errors that need tobe globally replaced with NA or only certain ranges of data will be analyzed the data canbe ldquocleanedrdquo using the scrub function

          Consider a data set of 10 rows of 12 columns with values from 1 - 120 All values of columns

          11

          gt png( outlierpng )

          gt d2 lt- outlier(satactcex=8)

          gt devoff()

          null device

          1

          Figure 1 Using the outlier function to graphically show outliers The y axis is theMahalanobis D2 the X axis is the distribution of χ2 for the same number of degrees offreedom The outliers detected here may be shown graphically using pairspanels (see2 and may be found by sorting d2

          12

          3 - 5 that are less than 30 40 or 50 respectively or greater than 70 in any of the threecolumns will be replaced with NA In addition any value exactly equal to 45 will be setto NA (max and isvalue are set to one value here but they could be a different value forevery column)

          gt x lt- matrix(1120ncol=10byrow=TRUE)

          gt colnames(x) lt- paste(V110sep=)gt newx lt- scrub(x35min=c(304050)max=70isvalue=45newvalue=NA)

          gt newx

          V1 V2 V3 V4 V5 V6 V7 V8 V9 V10

          [1] 1 2 NA NA NA 6 7 8 9 10

          [2] 11 12 NA NA NA 16 17 18 19 20

          [3] 21 22 NA NA NA 26 27 28 29 30

          [4] 31 32 33 NA NA 36 37 38 39 40

          [5] 41 42 43 44 NA 46 47 48 49 50

          [6] 51 52 53 54 55 56 57 58 59 60

          [7] 61 62 63 64 65 66 67 68 69 70

          [8] 71 72 NA NA NA 76 77 78 79 80

          [9] 81 82 NA NA NA 86 87 88 89 90

          [10] 91 92 NA NA NA 96 97 98 99 100

          [11] 101 102 NA NA NA 106 107 108 109 110

          [12] 111 112 NA NA NA 116 117 118 119 120

          Note that the number of subjects for those columns has decreased and the minimums havegone up but the maximums down Data cleaning and examination for outliers should be aroutine part of any data analysis

          333 Recoding categorical variables into dummy coded variables

          Sometimes categorical variables (eg college major occupation ethnicity) are to be ana-lyzed using correlation or regression To do this one can form ldquodummy codesrdquo which aremerely binary variables for each category This may be done using dummycode Subse-quent analyses using these dummy coded variables may be using biserial or point biserial(regular Pearson r) to show effect sizes and may be plotted in eg spider plots

          Alternatively sometimes data were coded originally as categorical (MaleFemale HighSchool some College in college etc) and you want to convert these columns of data tonumeric This is done by char2numeric

          34 Simple descriptive graphics

          Graphic descriptions of data are very helpful both for understanding the data as well ascommunicating important results Scatter Plot Matrices (SPLOMS) using the pairspanelsfunction are useful ways to look for strange effects involving outliers and non-linearitieserrorbarsby will show group means with 95 confidence boundaries By default er-rorbarsby and errorbars will show ldquocats eyesrdquo to graphically show the confidence

          13

          limits (Figure 6) This may be turned off by specifying eyes=FALSE densityBy or vio-

          linBy may be used to show the distribution of the data in ldquoviolinrdquo plots (Figure 5) (Theseare sometimes called ldquolava-lamprdquo plots)

          341 Scatter Plot Matrices

          Scatter Plot Matrices (SPLOMS) are very useful for describing the data The pairspanelsfunction adapted from the help menu for the pairs function produces xy scatter plots ofeach pair of variables below the diagonal shows the histogram of each variable on thediagonal and shows the lowess locally fit regression line as well An ellipse around themean with the axis length reflecting one standard deviation of the x and y variables is alsodrawn The x axis in each scatter plot represents the column variable the y axis the rowvariable (Figure 2) When plotting many subjects it is both faster and cleaner to set theplot character (pch) to be rsquorsquo (See Figure 2 for an example)

          pairspanels will show the pairwise scatter plots of all the variables as well as his-tograms locally smoothed regressions and the Pearson correlation When plottingmany data points (as in the case of the satact data it is possible to specify that theplot character is a period to get a somewhat cleaner graphic However in this figureto show the outliers we use colors and a larger plot character If we want to indicatersquosignificancersquo of the correlations by the conventional use of rsquomagic astricksrsquo we can setthe stars=TRUE option

          Another example of pairspanels is to show differences between experimental groupsConsider the data in the affect data set The scores reflect post test scores on positiveand negative affect and energetic and tense arousal The colors show the results for fourmovie conditions depressing frightening movie neutral and a comedy

          Yet another demonstration of pairspanels is useful when you have many subjects andwant to show the density of the distributions To do this we will use the makekeys

          and scoreItems functions (discussed in the second vignette) to create scales measuringEnergetic Arousal Tense Arousal Positive Affect and Negative Affect (see the msq helpfile) We then show a pairspanels scatter plot matrix where we smooth the data pointsand show the density of the distribution by color

          342 Density or violin plots

          Graphical presentation of data may be shown using box plots to show the median and 25thand 75th percentiles A powerful alternative is to show the density distribution using theviolinBy function (Figure 5)

          14

          gt png( pairspanelspng )

          gt satd2 lt- dataframe(satactd2) combine the d2 statistics from before with the satact dataframe

          gt pairspanels(satd2bg=c(yellowblue)[(d2 gt 25)+1]pch=21stars=TRUE)

          gt devoff()

          null device

          1

          Figure 2 Using the pairspanels function to graphically show relationships The x axisin each scatter plot represents the column variable the y axis the row variable Note theextreme outlier for the ACT If the plot character were set to a period (pch=rsquorsquo) it wouldmake a cleaner graphic but in to show the outliers in color we use the plot characters 21and 22

          15

          gt png(affectpng)gt pairspanels(affect[1417]bg=c(redblackwhiteblue)[affect$Film]pch=21

          + main=Affect varies by movies )

          gt devoff()

          null device

          1

          Figure 3 Using the pairspanels function to graphically show relationships The x axis ineach scatter plot represents the column variable the y axis the row variable The coloringrepresent four different movie conditions

          16

          gt keys lt- makekeys(msq[175]list(

          + EA = c(active energetic vigorous wakeful wideawake fullofpep

          + lively -sleepy -tired -drowsy)

          + TA =c(intense jittery fearful tense clutchedup -quiet -still

          + -placid -calm -atrest)

          + PA =c(active excited strong inspired determined attentive

          + interested enthusiastic proud alert)

          + NAf =c(jittery nervous scared afraid guilty ashamed distressed

          + upset hostile irritable )) )

          gt scores lt- scoreItems(keysmsq[175])

          gt png(msqpng)gt pairspanels(scores$scoressmoother=TRUE

          + main =Density distributions of four measures of affect )

          gt devoff()

          null device

          1

          Figure 4 Using the pairspanels function to graphically show relationships The x axis ineach scatter plot represents the column variable the y axis the row variable The variablesare four measures of motivational state for 3896 participants Each scale is the averagescore of 10 items measuring motivational state Compare this a plot with smoother set toFALSE

          17

          gt data(satact)

          gt violinBy(satact[56]satact$gendergrpname=c(M F)main=Density Plot by gender for SAT V and Q)

          Density Plot by gender for SAT V and Q

          Obs

          erve

          d

          SATV M SATV F SATQ M SATQ F

          200

          300

          400

          500

          600

          700

          800

          Figure 5 Using the violinBy function to show the distribution of SAT V and Q for malesand females The plot shows the medians and 25th and 75th percentiles as well as theentire range and the density distribution

          18

          343 Means and error bars

          Additional descriptive graphics include the ability to draw error bars on sets of data aswell as to draw error bars in both the x and y directions for paired data These are thefunctions errorbars errorbarsby errorbarstab and errorcrosses

          errorbars show the 95 confidence intervals for each variable in a data frame or ma-trix These errors are based upon normal theory and the standard errors of the meanAlternative options include +- one standard deviation or 1 standard error If thedata are repeated measures the error bars will be reflect the between variable cor-relations By default the confidence intervals are displayed using a ldquocats eyesrdquo plotwhich emphasizes the distribution of confidence within the confidence interval

          errorbarsby does the same but grouping the data by some condition

          errorbarstab draws bar graphs from tabular data with error bars based upon thestandard error of proportion (σp =

          radicpqN)

          errorcrosses draw the confidence intervals for an x set and a y set of the same size

          The use of the errorbarsby function allows for graphic comparisons of different groups(see Figure 6) Five personality measures are shown as a function of high versus low scoreson a ldquolierdquo scale People with higher lie scores tend to report being more agreeable consci-entious and less neurotic than people with lower lie scores The error bars are based uponnormal theory and thus are symmetric rather than reflect any skewing in the data

          Although not recommended it is possible to use the errorbars function to draw bargraphs with associated error bars (This kind of dynamite plot (Figure 8) can be verymisleading in that the scale is arbitrary Go to a discussion of the problems in presentingdata this way at httpemdbolkerwikidotcomblogdynamite In the example shownnote that the graph starts at 0 although is out of the range This is a function of usingbars which always are assumed to start at zero Consider other ways of showing yourdata

          344 Error bars for tabular data

          However it is sometimes useful to show error bars for tabular data either found by thetable function or just directly input These may be found using the errorbarstab

          function

          19

          gt data(epibfi)

          gt errorbarsby(epibfi[610]epibfi$epilielt4)

          095 confidence limits

          Independent Variable

          Dep

          ende

          nt V

          aria

          ble

          bfagree bfcon bfext bfneur bfopen

          050

          100

          150

          Figure 6 Using the errorbarsby function shows that self reported personality scales onthe Big Five Inventory vary as a function of the Lie scale on the EPI The ldquocats eyesrdquo showthe distribution of the confidence

          20

          gt errorbarsby(satact[56]satact$genderbars=TRUE

          + labels=c(MaleFemale)ylab=SAT scorexlab=)

          Male Female

          095 confidence limits

          SAT

          sco

          re

          200

          300

          400

          500

          600

          700

          800

          200

          300

          400

          500

          600

          700

          800

          Figure 7 A ldquoDynamite plotrdquo of SAT scores as a function of gender is one way of misleadingthe reader By using a bar graph the range of scores is ignored Bar graphs start from 0

          21

          gt T lt- with(satacttable(gendereducation))

          gt rownames(T) lt- c(MF)

          gt errorbarstab(Tway=bothylab=Proportion of Education Levelxlab=Level of Education

          + main=Proportion of sample by education level)

          Proportion of sample by education level

          Level of Education

          Pro

          port

          ion

          of E

          duca

          tion

          Leve

          l

          000

          005

          010

          015

          020

          025

          030

          M 0 M 1 M 2 M 3 M 4 M 5

          000

          005

          010

          015

          020

          025

          030

          Figure 8 The proportion of each education level that is Male or Female By using theway=rdquobothrdquo option the percentages and errors are based upon the grand total Alterna-tively way=rdquocolumnsrdquo finds column wise percentages way=rdquorowsrdquo finds rowwise percent-ages The data can be converted to percentages (as shown) or by total count (raw=TRUE)The function invisibly returns the probabilities and standard errors See the help menu foran example of entering the data as a dataframe

          22

          345 Two dimensional displays of means and errors

          Yet another way to display data for different conditions is to use the errorCrosses func-tion For instance the effect of various movies on both ldquoEnergetic Arousalrdquo and ldquoTenseArousalrdquo can be seen in one graph and compared to the same movie manipulations onldquoPositive Affectrdquo and ldquoNegative Affectrdquo Note how Energetic Arousal is increased by threeof the movie manipulations but that Positive Affect increases following the Happy movieonly

          23

          gt op lt- par(mfrow=c(12))

          gt data(affect)

          gt colors lt- c(blackredwhiteblue)

          gt films lt- c(SadHorrorNeutralHappy)

          gt affectstats lt- errorCircles(EA2TA2data=affect[-c(120)]group=Filmlabels=films

          + xlab=Energetic Arousal ylab=Tense Arousalylim=c(1022)xlim=c(820)pch=16

          + cex=2colors=colors main = Movies effect on arousal)gt errorCircles(PA2NA2data=affectstatslabels=filmsxlab=Positive Affect

          + ylab=Negative Affect pch=16cex=2colors=colors main =Movies effect on affect)

          gt op lt- par(mfrow=c(11))

          8 12 16 20

          1012

          1416

          1820

          22

          Movies effect on arousal

          Energetic Arousal

          Tens

          e A

          rous

          al

          SadHorror

          NeutralHappy

          6 8 10 12

          24

          68

          10

          Movies effect on affect

          Positive Affect

          Neg

          ativ

          e A

          ffect

          Sad

          Horror

          NeutralHappy

          Figure 9 The use of the errorCircles function allows for two dimensional displays ofmeans and error bars The first call to errorCircles finds descriptive statistics for theaffect dataframe based upon the grouping variable of Film These data are returned andthen used by the second call which examines the effect of the same grouping variable upondifferent measures The size of the circles represent the relative sample sizes for each groupThe data are from the PMC lab and reported in Smillie et al (2012)

          24

          346 Back to back histograms

          The bibars function summarize the characteristics of two groups (eg males and females)on a second variable (eg age) by drawing back to back histograms (see Figure 10)

          25

          data(bfi)gt png( bibarspng )

          gt with(bfibibars(agegenderylab=Agemain=Age by males and females))

          gt devoff()

          null device

          1

          Figure 10 A bar plot of the age distribution for males and females shows the use ofbibars The data are males and females from 2800 cases collected using the SAPAprocedure and are available as part of the bfi data set

          26

          347 Correlational structure

          There are many ways to display correlations Tabular displays are probably the mostcommon The output from the cor function in core R is a rectangular matrix lowerMat

          will round this to (2) digits and then display as a lower off diagonal matrix lowerCor

          calls cor with use=lsquopairwisersquo method=lsquopearsonrsquo as default values and returns (invisibly)the full correlation matrix and displays the lower off diagonal matrix

          gt lowerCor(satact)

          gendr edctn age ACT SATV SATQ

          gender 100

          education 009 100

          age -002 055 100

          ACT -004 015 011 100

          SATV -002 005 -004 056 100

          SATQ -017 003 -003 059 064 100

          When comparing results from two different groups it is convenient to display them as onematrix with the results from one group below the diagonal and the other group above thediagonal Use lowerUpper to do this

          gt female lt- subset(satactsatact$gender==2)

          gt male lt- subset(satactsatact$gender==1)

          gt lower lt- lowerCor(male[-1])

          edctn age ACT SATV SATQ

          education 100

          age 061 100

          ACT 016 015 100

          SATV 002 -006 061 100

          SATQ 008 004 060 068 100

          gt upper lt- lowerCor(female[-1])

          edctn age ACT SATV SATQ

          education 100

          age 052 100

          ACT 016 008 100

          SATV 007 -003 053 100

          SATQ 003 -009 058 063 100

          gt both lt- lowerUpper(lowerupper)

          gt round(both2)

          education age ACT SATV SATQ

          education NA 052 016 007 003

          age 061 NA 008 -003 -009

          ACT 016 015 NA 053 058

          SATV 002 -006 061 NA 063

          SATQ 008 004 060 068 NA

          It is also possible to compare two matrices by taking their differences and displaying one (be-low the diagonal) and the difference of the second from the first above the diagonal

          27

          gt diffs lt- lowerUpper(lowerupperdiff=TRUE)

          gt round(diffs2)

          education age ACT SATV SATQ

          education NA 009 000 -005 005

          age 061 NA 007 -003 013

          ACT 016 015 NA 008 002

          SATV 002 -006 061 NA 005

          SATQ 008 004 060 068 NA

          348 Heatmap displays of correlational structure

          Perhaps a better way to see the structure in a correlation matrix is to display a heat mapof the correlations This is just a matrix color coded to represent the magnitude of thecorrelation This is useful when considering the number of factors in a data set Considerthe Thurstone data set which has a clear 3 factor solution (Figure 11) or a simulated dataset of 24 variables with a circumplex structure (Figure 12) The color coding representsa ldquoheat maprdquo of the correlations with darker shades of red representing stronger negativeand darker shades of blue stronger positive correlations As an option the value of thecorrelation can be shown

          Yet another way to show structure is to use ldquospiderrdquo plots Particularly if variables areordered in some meaningful way (eg in a circumplex) a spider plot will show this structureeasily This is just a plot of the magnitude of the correlation as a radial line with lengthranging from 0 (for a correlation of -1) to 1 (for a correlation of 1) (See Figure 13)

          35 Testing correlations

          Correlations are wonderful descriptive statistics of the data but some people like to testwhether these correlations differ from zero or differ from each other The cortest func-tion (in the stats package) will test the significance of a single correlation and the rcorr

          function in the Hmisc package will do this for many correlations In the psych packagethe corrtest function reports the correlation (Pearson Spearman or Kendall) betweenall variables in either one or two data frames or matrices as well as the number of obser-vations for each case and the (two-tailed) probability for each correlation Unfortunatelythese probability values have not been corrected for multiple comparisons and so shouldbe taken with a great deal of salt Thus in corrtest and corrp the raw probabilitiesare reported below the diagonal and the probabilities adjusted for multiple comparisonsusing (by default) the Holm correction are reported above the diagonal (Table 1) (See thepadjust function for a discussion of Holm (1979) and other corrections)

          Testing the difference between any two correlations can be done using the rtest functionThe function actually does four different tests (based upon an article by Steiger (1980)

          28

          gt png(corplotpng)gt corPlot(Thurstonenumbers=TRUEupper=FALSEdiag=FALSEmain=9 cognitive variables from Thurstone)

          gt devoff()

          null device

          1

          Figure 11 The structure of correlation matrix can be seen more clearly if the variables aregrouped by factor and then the correlations are shown by color By using the rsquonumbersrsquooption the values are displayed as well By default the complete matrix is shown Settingupper=FALSE and diag=FALSE shows a cleaner figure

          29

          gt png(circplotpng)gt circ lt- simcirc(24)

          gt rcirc lt- cor(circ)

          gt corPlot(rcircmain=24 variables in a circumplex)gt devoff()

          null device

          1

          Figure 12 Using the corPlot function to show the correlations in a circumplex Correlationsare highest near the diagonal diminish to zero further from the diagonal and the increaseagain towards the corners of the matrix Circumplex structures are common in the studyof affect For circumplex structures it is perhaps useful to show the complete matrix

          30

          gt png(spiderpng)gt oplt- par(mfrow=c(22))

          gt spider(y=c(161218)x=124data=rcircfill=TRUEmain=Spider plot of 24 circumplex variables)

          gt op lt- par(mfrow=c(11))

          gt devoff()

          null device

          1

          Figure 13 A spider plot can show circumplex structure very clearly Circumplex structuresare common in the study of affect

          31

          Table 1 The corrtest function reports correlations cell sizes and raw and adjustedprobability values corrp reports the probability values for a correlation matrix Bydefault the adjustment used is that of Holm (1979)gt corrtest(satact)

          Callcorrtest(x = satact)

          Correlation matrix

          gender education age ACT SATV SATQ

          gender 100 009 -002 -004 -002 -017

          education 009 100 055 015 005 003

          age -002 055 100 011 -004 -003

          ACT -004 015 011 100 056 059

          SATV -002 005 -004 056 100 064

          SATQ -017 003 -003 059 064 100

          Sample Size

          gender education age ACT SATV SATQ

          gender 700 700 700 700 700 687

          education 700 700 700 700 700 687

          age 700 700 700 700 700 687

          ACT 700 700 700 700 700 687

          SATV 700 700 700 700 700 687

          SATQ 687 687 687 687 687 687

          Probability values (Entries above the diagonal are adjusted for multiple tests)

          gender education age ACT SATV SATQ

          gender 000 017 100 100 1 0

          education 002 000 000 000 1 1

          age 058 000 000 003 1 1

          ACT 033 000 000 000 0 0

          SATV 062 022 026 000 0 0

          SATQ 000 036 037 000 0 0

          To see confidence intervals of the correlations print with the short=FALSE option

          32

          depending upon the input

          1) For a sample size n find the t and p value for a single correlation as well as the confidenceinterval

          gt rtest(503)

          Correlation tests

          Callrtest(n = 50 r12 = 03)

          Test of significance of a correlation

          t value 218 with probability lt 0034

          and confidence interval 002 053

          2) For sample sizes of n and n2 (n2 = n if not specified) find the z of the difference betweenthe z transformed correlations divided by the standard error of the difference of two zscores

          gt rtest(3046)

          Correlation tests

          Callrtest(n = 30 r12 = 04 r34 = 06)

          Test of difference between two independent correlations

          z value 099 with probability 032

          3) For sample size n and correlations ra= r12 rb= r23 and r13 specified test for thedifference of two dependent correlations (Steiger case A)

          gt rtest(103451)

          Correlation tests

          Call[1] rtest(n = 103 r12 = 04 r23 = 01 r13 = 05 )

          Test of difference between two correlated correlations

          t value -089 with probability lt 037

          4) For sample size n test for the difference between two dependent correlations involvingdifferent variables (Steiger case B)

          gt rtest(103567558) steiger Case B

          Correlation tests

          Callrtest(n = 103 r12 = 05 r34 = 06 r23 = 07 r13 = 05 r14 = 05

          r24 = 08)

          Test of difference between two dependent correlations

          z value -12 with probability 023

          To test whether a matrix of correlations differs from what would be expected if the popu-lation correlations were all zero the function cortest follows Steiger (1980) who pointedout that the sum of the squared elements of a correlation matrix or the Fisher z scoreequivalents is distributed as chi square under the null hypothesis that the values are zero(ie elements of the identity matrix) This is particularly useful for examining whethercorrelations in a single matrix differ from zero or for comparing two matrices Althoughobvious cortest can be used to test whether the satact data matrix produces non-zerocorrelations (it does) This is a much more appropriate test when testing whether a residualmatrix differs from zero

          gt cortest(satact)

          33

          Tests of correlation matrices

          Callcortest(R1 = satact)

          Chi Square value 132542 with df = 15 with probability lt 18e-273

          36 Polychoric tetrachoric polyserial and biserial correlations

          The Pearson correlation of dichotomous data is also known as the φ coefficient If thedata eg ability items are thought to represent an underlying continuous although latentvariable the φ will underestimate the value of the Pearson applied to these latent variablesOne solution to this problem is to use the tetrachoric correlation which is based uponthe assumption of a bivariate normal distribution that has been cut at certain points Thedrawtetra function demonstrates the process (Figure 14) This is also shown in termsof dichotomizing the bivariate normal density function using the drawcor function (Fig-ure 15) A simple generalization of this to the case of the multiple cuts is the polychoric

          correlation

          Other estimated correlations based upon the assumption of bivariate normality with cutpoints include the biserial and polyserial correlation

          If the data are a mix of continuous polytomous and dichotomous variables the mixedcor

          function will calculate the appropriate mixture of Pearson polychoric tetrachoric biserialand polyserial correlations

          The correlation matrix resulting from a number of tetrachoric or polychoric correlationmatrix sometimes will not be positive semi-definite This will sometimes happen if thecorrelation matrix is formed by using pair-wise deletion of cases The corsmooth functionwill adjust the smallest eigen values of the correlation matrix to make them positive rescaleall of them to sum to the number of variables and produce aldquosmoothedrdquocorrelation matrixAn example of this problem is a data set of burt which probably had a typo in the originalcorrelation matrix Smoothing the matrix corrects this problem

          4 Multilevel modeling

          Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular use

          34

          gt drawtetra()

          minus3 minus2 minus1 0 1 2 3

          minus3

          minus2

          minus1

          01

          23

          Y rho = 05phi = 033

          X gt τY gt Τ

          X lt τY gt Τ

          X gt τY lt Τ

          X lt τY lt Τ

          x

          dnor

          m(x

          )

          X gt τ

          τ

          x1

          Y gt Τ

          Τ

          Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values

          35

          gt drawcor(expand=20cuts=c(00))

          xy

          z

          Bivariate density rho = 05

          Figure 15 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values It isfound (laboriously) by optimizing the fit of the bivariate normal for various values of thecorrelation to the observed cell frequencies

          36

          is the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups

          41 Decomposing data into within and between level correlations usingstatsBy

          There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler functionto give some of the basic descriptive statistics for two level models

          This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package

          rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)

          where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means

          42 Generating and displaying multilevel data

          withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a betweengroup correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data

          simmultilevel will generate simulated data with a multilevel structure

          The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary

          function specifying the variable of interest

          37

          Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )

          43 Factor analysis by groups

          Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output

          sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)

          faBy(sbnfactors=5) find the 5 factor solution for each education level

          5 Multiple Regression mediation moderation and set cor-relations

          The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices

          51 Multiple regression from data or correlation matrices

          The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstonecorrelation matrix and find the multiple correlation of the last five variables as a functionof the first 4

          gt setCor(y = 59x=14data=Thurstone)

          Call setCor(y = 59 x = 14 data = Thurstone)

          Multiple Regression from matrix input

          Beta weights

          FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

          Sentences 009 007 025 021 020

          Vocabulary 009 017 009 016 -002

          SentCompletion 002 005 004 021 008

          FirstLetters 058 045 021 008 031

          38

          Multiple R

          FourLetterWords Suffixes LetterSeries Pedigrees

          069 063 050 058

          LetterGroup

          048

          multiple R2

          FourLetterWords Suffixes LetterSeries Pedigrees

          048 040 025 034

          LetterGroup

          023

          Multiple Inflation Factor (VIF) = 1(1-SMC) =

          Sentences Vocabulary SentCompletion FirstLetters

          369 388 300 135

          Unweighted multiple R

          FourLetterWords Suffixes LetterSeries Pedigrees

          059 058 049 058

          LetterGroup

          045

          Unweighted multiple R2

          FourLetterWords Suffixes LetterSeries Pedigrees

          034 034 024 033

          LetterGroup

          020

          Various estimates of between set correlations

          Squared Canonical Correlations

          [1] 06280 01478 00076 00049

          Average squared canonical correlation = 02

          Cohens Set Correlation R2 = 069

          Unweighted correlation between the two sets = 073

          By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed

          gt sc lt- setCor(y = 59x=34data=Thurstonez=12)

          Call setCor(y = 59 x = 34 data = Thurstone z = 12)

          Multiple Regression from matrix input

          Beta weights

          FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

          SentCompletion 002 005 004 021 008

          FirstLetters 058 045 021 008 031

          Multiple R

          FourLetterWords Suffixes LetterSeries Pedigrees

          058 046 021 018

          LetterGroup

          030

          39

          multiple R2

          FourLetterWords Suffixes LetterSeries Pedigrees

          0331 0210 0043 0032

          LetterGroup

          0092

          Multiple Inflation Factor (VIF) = 1(1-SMC) =

          SentCompletion FirstLetters

          102 102

          Unweighted multiple R

          FourLetterWords Suffixes LetterSeries Pedigrees

          044 035 017 014

          LetterGroup

          026

          Unweighted multiple R2

          FourLetterWords Suffixes LetterSeries Pedigrees

          019 012 003 002

          LetterGroup

          007

          Various estimates of between set correlations

          Squared Canonical Correlations

          [1] 0405 0023

          Average squared canonical correlation = 021

          Cohens Set Correlation R2 = 042

          Unweighted correlation between the two sets = 048

          gt round(sc$residual2)

          FourLetterWords Suffixes LetterSeries Pedigrees

          FourLetterWords 052 011 009 006

          Suffixes 011 060 -001 001

          LetterSeries 009 -001 075 028

          Pedigrees 006 001 028 066

          LetterGroup 013 003 037 020

          LetterGroup

          FourLetterWords 013

          Suffixes 003

          LetterSeries 037

          Pedigrees 020

          LetterGroup 077

          52 Mediation and Moderation analysis

          Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping

          40

          Consider the example from Preacher and Hayes (2004) as analyzed using the mediate

          function and the subsequent graphic from mediatediagram The data are found in theexample for mediate

          Call mediate(y = SATIS x = THERAPY m = ATTRIB data = sobel)

          The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB

          Total Direct effect(c) of THERAPY on SATIS = 076 SE = 031 t direct = 25 with probability = 0019

          Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t direct = 135 with probability = 019

          Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033

          Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 004 Upper CI = 069

          R2 of model = 031

          To see the longer output specify short = FALSE in the print statement

          Full output

          Total effect estimates (c)

          SATIS se t Prob

          THERAPY 076 031 25 00186

          Direct effect estimates (c)SATIS se t Prob

          THERAPY 043 032 135 0190

          ATTRIB 040 018 223 0034

          a effect estimates

          THERAPY se t Prob

          ATTRIB 082 03 274 00106

          b effect estimates

          SATIS se t Prob

          ATTRIB 04 018 223 0034

          ab effect estimates

          SATIS boot sd lower upper

          THERAPY 033 032 017 004 069

          bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables

          setCor(y = c( SATV SATQ) x = c(education age ) data = satact std=TRUE)

          bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap

          mediate(y = c( SATV ) x = c(education age ) m= ACT data =satactstd=TRUEniter=50)

          bull mediate will take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It then tests the mediation effect using a bootstrap The particular example is for demonstration purposes only and shows neithermoderation nor mediation The number of iterations for the boot strap was set to 50

          41

          gt mediatediagram(preacher)

          Mediation model

          THERAPY SATIS

          ATTRIB

          082

          c = 076

          c = 043

          04

          Figure 16 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram

          42

          gt preacher lt- setCor(1c(23)sobelstd=FALSE)

          gt setCordiagram(preacher)

          Regression Models

          THERAPY

          ATTRIB

          SATIS

          043

          04

          021

          Figure 17 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure

          43

          for speed The default number of boot straps is 5000

          53 Set Correlation

          An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor

          function Set correlation is

          R2 = 1minusn

          prodi=1

          (1minusλi)

          where λi is the ith eigen value of the eigen value decomposition of the matrix

          R = Rminus1xx RxyRminus1

          xx Rminus1xy

          Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate

          setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data

          Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things tonotice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied

          For this example the analysis is done on the correlation matrix rather than the rawdata

          gt C lt- cov(satactuse=pairwise)

          gt model1 lt- lm(ACT~ gender + education + age data=satact)

          gt summary(model1)

          Call

          lm(formula = ACT ~ gender + education + age data = satact)

          Residuals

          44

          Call mediate(y = c(SATQ) x = c(ACT) m = education data = satact

          mod = gender niter = 50 std = TRUE)

          The DV (Y) was SATQ The IV (X) was ACT gender ACTXgndr The mediating variable(s) = education

          Total Direct effect(c) of ACT on SATQ = 058 SE = 003 t direct = 1925 with probability = 0

          Direct effect (c) of ACT on SATQ removing education = 059 SE = 003 t direct = 1926 with probability = 0

          Indirect effect (ab) of ACT on SATQ through education = -001

          Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -002 Upper CI = 0

          Total Direct effect(c) of gender on SATQ = -014 SE = 003 t direct = -478 with probability = 21e-06

          Direct effect (c) of gender on NA removing education = -014 SE = 003 t direct = -463 with probability = 44e-06

          Indirect effect (ab) of gender on SATQ through education = 0

          Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -001 Upper CI = 0

          Total Direct effect(c) of ACTXgndr on SATQ = 0 SE = 003 t direct = 002 with probability = 099

          Direct effect (c) of ACTXgndr on NA removing education = 0 SE = 003 t direct = 001 with probability = 099

          Indirect effect (ab) of ACTXgndr on SATQ through education = 0

          Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = 0 Upper CI = 0

          R2 of model = 037

          To see the longer output specify short = FALSE in the print statement

          Full output

          Total effect estimates (c)

          SATQ se t Prob

          ACT 058 003 1925 000e+00

          gender -014 003 -478 210e-06

          ACTXgndr 000 003 002 985e-01

          Direct effect estimates (c)SATQ se t Prob

          ACT 059 003 1926 000e+00

          gender -014 003 -463 437e-06

          ACTXgndr 000 003 001 992e-01

          a effect estimates

          education se t Prob

          ACT 016 004 422 277e-05

          gender 009 004 250 128e-02

          ACTXgndr -001 004 -015 883e-01

          b effect estimates

          SATQ se t Prob

          education -004 003 -145 0147

          ab effect estimates

          SATQ boot sd lower upper

          ACT -001 -001 001 0 0

          gender 000 000 000 0 0

          ACTXgndr 000 000 000 0 0

          Moderation model

          ACT

          gender

          ACTXgndr

          SATQ

          education016 c = 058

          c = 059

          009 c = minus014

          c = minus014

          minus001 c = 0

          c = 0

          minus004

          minus004

          minus007

          002

          Figure 18 Moderated multiple regression requires the raw data

          45

          Min 1Q Median 3Q Max

          -252458 -32133 07769 35921 92630

          Coefficients

          Estimate Std Error t value Pr(gt|t|)

          (Intercept) 2741706 082140 33378 lt 2e-16

          gender -048606 037984 -1280 020110

          education 047890 015235 3143 000174

          age 001623 002278 0712 047650

          ---

          Signif codes 0 0001 001 005 01 1

          Residual standard error 4768 on 696 degrees of freedom

          Multiple R-squared 00272 Adjusted R-squared 002301

          F-statistic 6487 on 3 and 696 DF p-value 00002476

          Compare this with the output from setCor

          gt compare with sector

          gt setCor(c(46)c(13)C nobs=700)

          Call setCor(y = c(46) x = c(13) data = C nobs = 700)

          Multiple Regression from matrix input

          Beta weights

          ACT SATV SATQ

          gender -005 -003 -018

          education 014 010 010

          age 003 -010 -009

          Multiple R

          ACT SATV SATQ

          016 010 019

          multiple R2

          ACT SATV SATQ

          00272 00096 00359

          Multiple Inflation Factor (VIF) = 1(1-SMC) =

          gender education age

          101 145 144

          Unweighted multiple R

          ACT SATV SATQ

          015 005 011

          Unweighted multiple R2

          ACT SATV SATQ

          002 000 001

          SE of Beta weights

          ACT SATV SATQ

          gender 018 429 434

          education 022 513 518

          age 022 511 516

          t of Beta Weights

          ACT SATV SATQ

          gender -027 -001 -004

          education 065 002 002

          46

          age 015 -002 -002

          Probability of t lt

          ACT SATV SATQ

          gender 079 099 097

          education 051 098 098

          age 088 098 099

          Shrunken R2

          ACT SATV SATQ

          00230 00054 00317

          Standard Error of R2

          ACT SATV SATQ

          00120 00073 00137

          F

          ACT SATV SATQ

          649 226 863

          Probability of F lt

          ACT SATV SATQ

          248e-04 808e-02 124e-05

          degrees of freedom of regression

          [1] 3 696

          Various estimates of between set correlations

          Squared Canonical Correlations

          [1] 0050 0033 0008

          Chisq of canonical correlations

          [1] 358 231 56

          Average squared canonical correlation = 003

          Cohens Set Correlation R2 = 009

          Shrunken Set Correlation R2 = 008

          F and df of Cohens Set Correlation 726 9 168186

          Unweighted correlation between the two sets = 001

          Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

          6 Converting output to APA style tables using LATEX

          Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenient

          47

          LATEXoutput and finally df2latex converts a generic data frame to LATEX

          An example of converting the output from fa to LATEXappears in Table 2

          Table 2 fa2latexA factor analysis table from the psych package in R

          Variable MR1 MR2 MR3 h2 u2 com

          Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

          SS loadings 264 186 15

          MR1 100 059 054MR2 059 100 052MR3 054 052 100

          48

          7 Miscellaneous functions

          A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

          blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

          df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

          scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

          cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

          cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

          dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

          fisherz Convert a correlation to the corresponding Fisher z score

          geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

          ICC and cohenkappa are typically used to find the reliability for raters

          headtail combines the head and tail functions to show the first and last lines of a dataset or output

          topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

          mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

          prep finds the probability of replication for an F t or r and estimate effect size

          partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

          rangeCorrection will correct correlations for restriction of range

          reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

          49

          superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

          8 Data sets

          A number of data sets for demonstrating psychometric techniques are included in thepsych package These include six data sets showing a hierarchical factor structure (fivecognitive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2and one from health psychology Reise) One of these (Thurstone) is used as an examplein the sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) or 13 personality inventory scores (epibfi) and14 multiple choice iq items (iqitems) The vegetables example has paired comparisonpreferences for 9 vegetables This is an example of Thurstonian scaling used by Guilford(1954) and Nunnally (1967) Other data sets include cubits peas and heights fromGalton

          Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

          bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

          satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

          epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

          50

          iq 14 multiple choice ability items were included as part of the Synthetic Aperture Person-ality Assessment (SAPA) web based personality assessment project The data from1000 subjects are included here as a demonstration set for scoring multiple choiceinventories and doing basic item statistics

          galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to ntroduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

          Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

          miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

          9 Development version and a users guide

          The most recent development version is available as a source file at the repository main-tained at httppersonality-projectorgr That version will have removed the mostrecently discovered bugs (but perhaps introduced other yet to be discovered ones) Todownload that version go to the repository httppersonality-projectorgrsrc

          contrib and wander around For a Mac this version can be installed directly using theldquoother repositoryrdquo option in the package installer For a PC the zip file for the most recentrelease has been created using the win-builder facility at CRAN The development releasefor the Mac is usually several weeks ahead of the PC development version

          Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httppersonality-projectorgrpsych_manualpdf

          News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

          gt news(Version gt 170package=psych)

          51

          10 Psychometric Theory

          The psych package has been developed to help psychologists do basic research Many ofthe functions were developed to supplement a book (httppersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep)More information about the use of some of the functions may be found in the book

          For more extensive discussion of the use of psych in particular and R in general consulthttppersonality-projectorgrrguidehtml A short guide to R

          11 SessionInfo

          This document was prepared using the following settings

          gt sessionInfo()

          R Under development (unstable) (2017-03-05 r72309)

          Platform x86_64-apple-darwin1340 (64-bit)

          Running under macOS Sierra 10124

          Matrix products default

          BLAS LibraryFrameworksRframeworkVersions34ResourcesliblibRblas0dylib

          LAPACK LibraryFrameworksRframeworkVersions34ResourcesliblibRlapackdylib

          locale

          [1] C

          attached base packages

          [1] stats graphics grDevices utils datasets methods base

          other attached packages

          [1] psych_17421

          loaded via a namespace (and not attached)

          [1] compiler_340 parallel_340 tools_340 foreign_08-67

          [5] KernSmooth_223-15 nlme_31-131 mnormt_15-4 grid_340

          [9] lattice_020-34

          52

          References

          Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

          Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

          Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

          Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

          Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

          Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

          Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

          Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

          Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

          Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

          Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

          Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

          Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

          Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

          Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

          53

          Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

          Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

          Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

          Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

          Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

          Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

          Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

          Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

          Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

          Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

          MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

          Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

          McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

          Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

          Nunnally J C (1967) Psychometric theory McGraw-Hill New York

          54

          Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

          3rd edition

          Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

          Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

          Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

          Revelle W (2015) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 158

          Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

          Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

          Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

          Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

          Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook ofIndividual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

          Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

          Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

          Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

          Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bang

          55

          for the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

          Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

          Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

          Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

          Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

          Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

          Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

          Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

          Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

          Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

          Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

          Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

          56

          Index

          affect 14 24alpha 5 6

          Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26bifactor 6biserial 13 34blockrandom 49burt 34

          char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49circular statistics 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49ctv 7cubits 50

          densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49diagram 8drawcor 34drawtetra 34dummycode 13

          dynamite plot 19

          edit 3epibfi 50error bars 19errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23

          fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factor analysis 6factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49

          galton 51generalized least squares 6geometricmean 49GPArotation 7guttman 6

          harmonicmean 49head 49headtail 49heights 50hetdiagram 7Hmisc 28Holzinger 50

          57

          ICC 6 49iclust 6iclustdiagram 7Index 49introduction to psychometric theory with ap-

          plications in R 7iqitems 50irtfa 6 47irt2latex 47 49

          KnitR 47

          lavaan 38library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27lowess 14

          makekeys 14MAP 6mardia 49maximum likelihood 6mediate 4 41 42mediatediagram 41minimum residual 6mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevel 37multilevelreliability 6multiple regression 38

          nfactors 6nlme 37

          omega 6 7outlier 3 11 12

          padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7principal axis 6psych 3 5ndash8 28 47 49ndash52

          R functionaffect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50

          58

          densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13edit 3epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49head 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47

          irt2latex 47 49library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7psych 51psych package

          affect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26

          59

          biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49corsmooth 34cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7

          fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47irt2latex 47 49lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12prep 49pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34

          60

          polyserial 34principal 5ndash7psych 51rtest 28rangeCorrection 49readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

          rtest 28

          rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50table 19tail 49tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

          R package

          61

          ctv 7GPArotation 7Hmisc 28KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash8 28 47 49ndash52Rgraphviz 8sem 7 50stats 28Sweave 47xtable 47

          rtest 28rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49Rgraphviz 8

          SAPA 26 50 51satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43sem 7 50set correlation 44setcor 49setCor 4 38 41 44 46 47simmultilevel 37

          spider 13stars 14stats 28StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50Sweave 47

          table 19tail 49tetrachoric 6 34Thurstone 28 38 50Thurstone33 50topBottom 49

          vegetables 50 51violinBy 14 18vss 5 6

          weighted least squares 6withinBetween 37

          xtable 47

          62

          • Jump starting the psych packagendasha guide for the impatient
          • Psychometric functions are summarized in the second vignette
          • Overview of this and related documents
          • Getting started
          • Basic data analysis
            • Getting the data by using readfile
            • Data input from the clipboard
            • Basic descriptive statistics
              • Outlier detection using outlier
              • Basic data cleaning using scrub
              • Recoding categorical variables into dummy coded variables
                • Simple descriptive graphics
                  • Scatter Plot Matrices
                  • Density or violin plots
                  • Means and error bars
                  • Error bars for tabular data
                  • Two dimensional displays of means and errors
                  • Back to back histograms
                  • Correlational structure
                  • Heatmap displays of correlational structure
                    • Testing correlations
                    • Polychoric tetrachoric polyserial and biserial correlations
                      • Multilevel modeling
                        • Decomposing data into within and between level correlations using statsBy
                        • Generating and displaying multilevel data
                        • Factor analysis by groups
                          • Multiple Regression mediation moderation and set correlations
                            • Multiple regression from data or correlation matrices
                            • Mediation and Moderation analysis
                            • Set Correlation
                              • Converting output to APA style tables using LaTeX
                              • Miscellaneous functions
                              • Data sets
                              • Development version and a users guide
                              • Psychometric Theory
                              • SessionInfo

            1 Overview of this and related documents

            The psych package (Revelle 2015) has been developed at Northwestern University since2005 to include functions most useful for personality psychometric and psychological re-search The package is also meant to supplement a text on psychometric theory (Revelleprep) a draft of which is available at httppersonality-projectorgrbook

            Some of the functions (eg readfile readclipboard describe pairspanels scat-terhist errorbars multihist bibars) are useful for basic data entry and descrip-tive analyses

            Psychometric applications emphasize techniques for dimension reduction including factoranalysis cluster analysis and principal components analysis The fa function includesfive methods of factor analysis (minimum residual principal axis weighted least squaresgeneralized least squares and maximum likelihood factor analysis) Principal ComponentsAnalysis (PCA) is also available through the use of the principal or pca functions De-termining the number of factors or components to extract may be done by using the VerySimple Structure (Revelle and Rocklin 1979) (vss) Minimum Average Partial correlation(Velicer 1976) (MAP) or parallel analysis (faparallel) criteria These and several othercriteria are included in the nfactors function Two parameter Item Response Theory(IRT) models for dichotomous or polytomous items may be found by factoring tetra-

            choric or polychoric correlation matrices and expressing the resulting parameters interms of location and discrimination using irtfa

            Bifactor and hierarchical factor structures may be estimated by using Schmid Leimantransformations (Schmid and Leiman 1957) (schmid) to transform a hierarchical factorstructure into a bifactor solution (Holzinger and Swineford 1937) Higher order modelscan also be found using famulti

            Scale construction can be done using the Item Cluster Analysis (Revelle 1979) (iclust)function to determine the structure and to calculate reliability coefficients α (Cronbach1951)(alpha scoreItems scoremultiplechoice) β (Revelle 1979 Revelle and Zin-barg 2009) (iclust) and McDonaldrsquos ωh and ωt (McDonald 1999) (omega) Guttmanrsquos sixestimates of internal consistency reliability (Guttman (1945) as well as additional estimates(Revelle and Zinbarg 2009) are in the guttman function The six measures of Intraclasscorrelation coefficients (ICC) discussed by Shrout and Fleiss (1979) are also available

            For data with a a multilevel structure (eg items within subjects across time or itemswithin subjects across groups) the describeBy statsBy functions will give basic descrip-tives by group StatsBy also will find within group (or subject) correlations as well as thebetween group correlation

            multilevelreliability mlr will find various generalizability statistics for subjects over

            6

            time and items mlPlot will graph items over for each subject mlArrange converts widedata frames to long data frames suitable for multilevel modeling

            Graphical displays include Scatter Plot Matrix (SPLOM) plots using pairspanels cor-relation ldquoheat mapsrdquo (corPlot) factor cluster and structural diagrams using fadiagramiclustdiagram structurediagram and hetdiagram as well as item response charac-teristics and item and test information characteristic curves plotirt and plotpoly

            This vignette is meant to give an overview of the psych package That is it is meantto give a summary of the main functions in the psych package with examples of howthey are used for data description dimension reduction and scale construction The ex-tended user manual at psych_manualpdf includes examples of graphic output and moreextensive demonstrations than are found in the help menus (Also available at http

            personality-projectorgrpsych_manualpdf) The vignette psych for sem atpsych_for_sempdf discusses how to use psych as a front end to the sem package of JohnFox (Fox et al 2012) (The vignette is also available at httppersonality-project

            orgrbookpsych_for_sempdf)

            For a step by step tutorial in the use of the psych package and the base functions inR for basic personality research see the guide for using R for personality research athttppersonalitytheoryorgrrshorthtml For an introduction to psychometrictheory with applications in R see the draft chapters at httppersonality-project

            orgrbook)

            2 Getting started

            Some of the functions described in the Overview Vignette require other packages This isnot the case for the functions listed in this Introduction Particularly useful for rotatingthe results of factor analyses (from eg fa factorminres factorpa factorwlsor principal) or hierarchical factor models using omega or schmid is the GPArotationpackage These and other useful packages may be installed by first installing and thenusing the task views (ctv) package to install the ldquoPsychometricsrdquo task view but doing itthis way is not necessary

            installpackages(ctv)

            library(ctv)

            taskviews(Psychometrics)

            The ldquoPsychometricsrdquo task view will install a large number of useful packages To installthe bare minimum for the examples in this vignette it is necessary to install just 3 pack-ages

            7

            installpackages(list(c(GPArotationmnormt)

            Because of the difficulty of installing the package Rgraphviz alternative graphics have beendeveloped and are available as diagram functions If Rgraphviz is available some functionswill take advantage of it An alternative is to useldquodotrdquooutput of commands for any externalgraphics package that uses the dot language

            3 Basic data analysis

            A number of psych functions facilitate the entry of data and finding basic descriptivestatistics

            Remember to run any of the psych functions it is necessary to make the package activeby using the library command

            library(psych)

            The other packages once installed will be called automatically by psych

            It is possible to automatically load psych and other functions by creating and then savinga ldquoFirstrdquo function eg

            First lt- function(x) library(psych)

            31 Getting the data by using readfile

            Although many find copying the data to the clipboard and then using the readclipboardfunctions (see below) a helpful alternative is to read the data in directly This can be doneusing the readfile function which calls filechoose to find the file and then based uponthe suffix of the file chooses the appropriate way to read it For files with suffixes of txttext r rds rda csv xpt or sav the file will be read correctly

            mydata lt- readfile()

            If the file contains Fixed Width Format (fwf) data the column information can be specifiedwith the widths command

            mydata lt- readfile(widths = c(4rep(135)) will read in a file without a header row and 36 fields the first of which is 4 colums the rest of which are 1 column each

            If the file is a RData file (with suffix of RData Rda rda Rdata or rdata) the objectwill be loaded Depending what was stored this might be several objects If the file is asav file from SPSS it will be read with the most useful default options (converting the fileto a dataframe and converting character fields to numeric) Alternative options may bespecified If it is an export file from SAS (xpt or XPT) it will be read csv files (comma

            8

            separated files) normal txt or text files data or dat files will be read as well These areassumed to have a header row of variable labels (header=TRUE) If the data do not havea header row you must specify readfile(header=FALSE)

            To read SPSS files and to keep the value labels specify usevaluelabels=TRUE

            myspss lt- readfile(usevaluelabels=TRUE) this will keep the value labels for sav files

            32 Data input from the clipboard

            There are of course many ways to enter data into R Reading from a local file usingreadtable is perhaps the most preferred However many users will enter their datain a text editor or spreadsheet program and then want to copy and paste into R Thismay be done by using readtable and specifying the input file as ldquoclipboardrdquo (PCs) orldquopipe(pbpaste)rdquo (Macs) Alternatively the readclipboard set of functions are perhapsmore user friendly

            readclipboard is the base function for reading data from the clipboard

            readclipboardcsv for reading text that is comma delimited

            readclipboardtab for reading text that is tab delimited (eg copied directly from anExcel file)

            readclipboardlower for reading input of a lower triangular matrix with or without adiagonal The resulting object is a square matrix

            readclipboardupper for reading input of an upper triangular matrix

            readclipboardfwf for reading in fixed width fields (some very old data sets)

            For example given a data set copied to the clipboard from a spreadsheet just enter thecommand

            mydata lt- readclipboard()

            This will work if every data field has a value and even missing data are given some values(eg NA or -999) If the data were entered in a spreadsheet and the missing valueswere just empty cells then the data should be read in as a tab delimited or by using thereadclipboardtab function

            gt mydata lt- readclipboard(sep=t) define the tab option or

            gt mytabdata lt- readclipboardtab() just use the alternative function

            For the case of data in fixed width fields (some old data sets tend to have this format)copy to the clipboard and then specify the width of each field (in the example below the

            9

            first variable is 5 columns the second is 2 columns the next 5 are 1 column the last 4 are3 columns)

            gt mydata lt- readclipboardfwf(widths=c(52rep(15)rep(34))

            33 Basic descriptive statistics

            Once the data are read in then describe or describeBy will provide basic descriptivestatistics arranged in a data frame format Consider the data set satact which in-cludes data from 700 web based participants on 3 demographic variables and 3 abilitymeasures

            describe reports means standard deviations medians min max range skew kurtosisand standard errors for integer or real data Non-numeric data although the statisticsare meaningless will be treated as if numeric (based upon the categorical coding ofthe data) and will be flagged with an

            describeBy reports descriptive statistics broken down by some categorizing variable (eggender age etc)

            gt library(psych)

            gt data(satact)

            gt describe(satact) basic descriptive statistics

            vars n mean sd median trimmed mad min max range skew

            gender 1 700 165 048 2 168 000 1 2 1 -061

            education 2 700 316 143 3 331 148 0 5 5 -068

            age 3 700 2559 950 22 2386 593 13 65 52 164

            ACT 4 700 2855 482 29 2884 445 3 36 33 -066

            SATV 5 700 61223 11290 620 61945 11861 200 800 600 -064

            SATQ 6 687 61022 11564 620 61725 11861 200 800 600 -059

            kurtosis se

            gender -162 002

            education -007 005

            age 242 036

            ACT 053 018

            SATV 033 427

            SATQ -002 441

            These data may then be analyzed by groups defined in a logical statement or by some othervariable Eg break down the descriptive data for males or females These descriptivedata can also be seen graphically using the errorbarsby function (Figure 6) By settingskew=FALSE and ranges=FALSE the output is limited to the most basic statistics

            gt basic descriptive statistics by a grouping variable

            gt describeBy(satactsatact$genderskew=FALSEranges=FALSE)

            Descriptive statistics by group

            group 1

            vars n mean sd se

            gender 1 247 100 000 000

            10

            education 2 247 300 154 010

            age 3 247 2586 974 062

            ACT 4 247 2879 506 032

            SATV 5 247 61511 11416 726

            SATQ 6 245 63587 11602 741

            ------------------------------------------------------------

            group 2

            vars n mean sd se

            gender 1 453 200 000 000

            education 2 453 326 135 006

            age 3 453 2545 937 044

            ACT 4 453 2842 469 022

            SATV 5 453 61066 11231 528

            SATQ 6 442 59600 11307 538

            The output from the describeBy function can be forced into a matrix form for easy analysisby other programs In addition describeBy can group by several grouping variables at thesame time

            gt samat lt- describeBy(satactlist(satact$gendersatact$education)

            + skew=FALSEranges=FALSEmat=TRUE)

            gt headTail(samat)

            item group1 group2 vars n mean sd se

            gender1 1 1 0 1 27 1 0 0

            gender2 2 2 0 1 30 2 0 0

            gender3 3 1 1 1 20 1 0 0

            gender4 4 2 1 1 25 2 0 0

            ltNAgt ltNAgt ltNAgt

            SATQ9 69 1 4 6 51 6359 10412 1458

            SATQ10 70 2 4 6 86 59759 10624 1146

            SATQ11 71 1 5 6 46 65783 8961 1321

            SATQ12 72 2 5 6 93 60672 10555 1095

            331 Outlier detection using outlier

            One way to detect unusual data is to consider how far each data point is from the mul-tivariate centroid of the data That is find the squared Mahalanobis distance for eachdata point and then compare these to the expected values of χ2 This produces a Q-Q(quantle-quantile) plot with the n most extreme data points labeled (Figure 1) The outliervalues are in the vector d2

            332 Basic data cleaning using scrub

            If after describing the data it is apparent that there were data entry errors that need tobe globally replaced with NA or only certain ranges of data will be analyzed the data canbe ldquocleanedrdquo using the scrub function

            Consider a data set of 10 rows of 12 columns with values from 1 - 120 All values of columns

            11

            gt png( outlierpng )

            gt d2 lt- outlier(satactcex=8)

            gt devoff()

            null device

            1

            Figure 1 Using the outlier function to graphically show outliers The y axis is theMahalanobis D2 the X axis is the distribution of χ2 for the same number of degrees offreedom The outliers detected here may be shown graphically using pairspanels (see2 and may be found by sorting d2

            12

            3 - 5 that are less than 30 40 or 50 respectively or greater than 70 in any of the threecolumns will be replaced with NA In addition any value exactly equal to 45 will be setto NA (max and isvalue are set to one value here but they could be a different value forevery column)

            gt x lt- matrix(1120ncol=10byrow=TRUE)

            gt colnames(x) lt- paste(V110sep=)gt newx lt- scrub(x35min=c(304050)max=70isvalue=45newvalue=NA)

            gt newx

            V1 V2 V3 V4 V5 V6 V7 V8 V9 V10

            [1] 1 2 NA NA NA 6 7 8 9 10

            [2] 11 12 NA NA NA 16 17 18 19 20

            [3] 21 22 NA NA NA 26 27 28 29 30

            [4] 31 32 33 NA NA 36 37 38 39 40

            [5] 41 42 43 44 NA 46 47 48 49 50

            [6] 51 52 53 54 55 56 57 58 59 60

            [7] 61 62 63 64 65 66 67 68 69 70

            [8] 71 72 NA NA NA 76 77 78 79 80

            [9] 81 82 NA NA NA 86 87 88 89 90

            [10] 91 92 NA NA NA 96 97 98 99 100

            [11] 101 102 NA NA NA 106 107 108 109 110

            [12] 111 112 NA NA NA 116 117 118 119 120

            Note that the number of subjects for those columns has decreased and the minimums havegone up but the maximums down Data cleaning and examination for outliers should be aroutine part of any data analysis

            333 Recoding categorical variables into dummy coded variables

            Sometimes categorical variables (eg college major occupation ethnicity) are to be ana-lyzed using correlation or regression To do this one can form ldquodummy codesrdquo which aremerely binary variables for each category This may be done using dummycode Subse-quent analyses using these dummy coded variables may be using biserial or point biserial(regular Pearson r) to show effect sizes and may be plotted in eg spider plots

            Alternatively sometimes data were coded originally as categorical (MaleFemale HighSchool some College in college etc) and you want to convert these columns of data tonumeric This is done by char2numeric

            34 Simple descriptive graphics

            Graphic descriptions of data are very helpful both for understanding the data as well ascommunicating important results Scatter Plot Matrices (SPLOMS) using the pairspanelsfunction are useful ways to look for strange effects involving outliers and non-linearitieserrorbarsby will show group means with 95 confidence boundaries By default er-rorbarsby and errorbars will show ldquocats eyesrdquo to graphically show the confidence

            13

            limits (Figure 6) This may be turned off by specifying eyes=FALSE densityBy or vio-

            linBy may be used to show the distribution of the data in ldquoviolinrdquo plots (Figure 5) (Theseare sometimes called ldquolava-lamprdquo plots)

            341 Scatter Plot Matrices

            Scatter Plot Matrices (SPLOMS) are very useful for describing the data The pairspanelsfunction adapted from the help menu for the pairs function produces xy scatter plots ofeach pair of variables below the diagonal shows the histogram of each variable on thediagonal and shows the lowess locally fit regression line as well An ellipse around themean with the axis length reflecting one standard deviation of the x and y variables is alsodrawn The x axis in each scatter plot represents the column variable the y axis the rowvariable (Figure 2) When plotting many subjects it is both faster and cleaner to set theplot character (pch) to be rsquorsquo (See Figure 2 for an example)

            pairspanels will show the pairwise scatter plots of all the variables as well as his-tograms locally smoothed regressions and the Pearson correlation When plottingmany data points (as in the case of the satact data it is possible to specify that theplot character is a period to get a somewhat cleaner graphic However in this figureto show the outliers we use colors and a larger plot character If we want to indicatersquosignificancersquo of the correlations by the conventional use of rsquomagic astricksrsquo we can setthe stars=TRUE option

            Another example of pairspanels is to show differences between experimental groupsConsider the data in the affect data set The scores reflect post test scores on positiveand negative affect and energetic and tense arousal The colors show the results for fourmovie conditions depressing frightening movie neutral and a comedy

            Yet another demonstration of pairspanels is useful when you have many subjects andwant to show the density of the distributions To do this we will use the makekeys

            and scoreItems functions (discussed in the second vignette) to create scales measuringEnergetic Arousal Tense Arousal Positive Affect and Negative Affect (see the msq helpfile) We then show a pairspanels scatter plot matrix where we smooth the data pointsand show the density of the distribution by color

            342 Density or violin plots

            Graphical presentation of data may be shown using box plots to show the median and 25thand 75th percentiles A powerful alternative is to show the density distribution using theviolinBy function (Figure 5)

            14

            gt png( pairspanelspng )

            gt satd2 lt- dataframe(satactd2) combine the d2 statistics from before with the satact dataframe

            gt pairspanels(satd2bg=c(yellowblue)[(d2 gt 25)+1]pch=21stars=TRUE)

            gt devoff()

            null device

            1

            Figure 2 Using the pairspanels function to graphically show relationships The x axisin each scatter plot represents the column variable the y axis the row variable Note theextreme outlier for the ACT If the plot character were set to a period (pch=rsquorsquo) it wouldmake a cleaner graphic but in to show the outliers in color we use the plot characters 21and 22

            15

            gt png(affectpng)gt pairspanels(affect[1417]bg=c(redblackwhiteblue)[affect$Film]pch=21

            + main=Affect varies by movies )

            gt devoff()

            null device

            1

            Figure 3 Using the pairspanels function to graphically show relationships The x axis ineach scatter plot represents the column variable the y axis the row variable The coloringrepresent four different movie conditions

            16

            gt keys lt- makekeys(msq[175]list(

            + EA = c(active energetic vigorous wakeful wideawake fullofpep

            + lively -sleepy -tired -drowsy)

            + TA =c(intense jittery fearful tense clutchedup -quiet -still

            + -placid -calm -atrest)

            + PA =c(active excited strong inspired determined attentive

            + interested enthusiastic proud alert)

            + NAf =c(jittery nervous scared afraid guilty ashamed distressed

            + upset hostile irritable )) )

            gt scores lt- scoreItems(keysmsq[175])

            gt png(msqpng)gt pairspanels(scores$scoressmoother=TRUE

            + main =Density distributions of four measures of affect )

            gt devoff()

            null device

            1

            Figure 4 Using the pairspanels function to graphically show relationships The x axis ineach scatter plot represents the column variable the y axis the row variable The variablesare four measures of motivational state for 3896 participants Each scale is the averagescore of 10 items measuring motivational state Compare this a plot with smoother set toFALSE

            17

            gt data(satact)

            gt violinBy(satact[56]satact$gendergrpname=c(M F)main=Density Plot by gender for SAT V and Q)

            Density Plot by gender for SAT V and Q

            Obs

            erve

            d

            SATV M SATV F SATQ M SATQ F

            200

            300

            400

            500

            600

            700

            800

            Figure 5 Using the violinBy function to show the distribution of SAT V and Q for malesand females The plot shows the medians and 25th and 75th percentiles as well as theentire range and the density distribution

            18

            343 Means and error bars

            Additional descriptive graphics include the ability to draw error bars on sets of data aswell as to draw error bars in both the x and y directions for paired data These are thefunctions errorbars errorbarsby errorbarstab and errorcrosses

            errorbars show the 95 confidence intervals for each variable in a data frame or ma-trix These errors are based upon normal theory and the standard errors of the meanAlternative options include +- one standard deviation or 1 standard error If thedata are repeated measures the error bars will be reflect the between variable cor-relations By default the confidence intervals are displayed using a ldquocats eyesrdquo plotwhich emphasizes the distribution of confidence within the confidence interval

            errorbarsby does the same but grouping the data by some condition

            errorbarstab draws bar graphs from tabular data with error bars based upon thestandard error of proportion (σp =

            radicpqN)

            errorcrosses draw the confidence intervals for an x set and a y set of the same size

            The use of the errorbarsby function allows for graphic comparisons of different groups(see Figure 6) Five personality measures are shown as a function of high versus low scoreson a ldquolierdquo scale People with higher lie scores tend to report being more agreeable consci-entious and less neurotic than people with lower lie scores The error bars are based uponnormal theory and thus are symmetric rather than reflect any skewing in the data

            Although not recommended it is possible to use the errorbars function to draw bargraphs with associated error bars (This kind of dynamite plot (Figure 8) can be verymisleading in that the scale is arbitrary Go to a discussion of the problems in presentingdata this way at httpemdbolkerwikidotcomblogdynamite In the example shownnote that the graph starts at 0 although is out of the range This is a function of usingbars which always are assumed to start at zero Consider other ways of showing yourdata

            344 Error bars for tabular data

            However it is sometimes useful to show error bars for tabular data either found by thetable function or just directly input These may be found using the errorbarstab

            function

            19

            gt data(epibfi)

            gt errorbarsby(epibfi[610]epibfi$epilielt4)

            095 confidence limits

            Independent Variable

            Dep

            ende

            nt V

            aria

            ble

            bfagree bfcon bfext bfneur bfopen

            050

            100

            150

            Figure 6 Using the errorbarsby function shows that self reported personality scales onthe Big Five Inventory vary as a function of the Lie scale on the EPI The ldquocats eyesrdquo showthe distribution of the confidence

            20

            gt errorbarsby(satact[56]satact$genderbars=TRUE

            + labels=c(MaleFemale)ylab=SAT scorexlab=)

            Male Female

            095 confidence limits

            SAT

            sco

            re

            200

            300

            400

            500

            600

            700

            800

            200

            300

            400

            500

            600

            700

            800

            Figure 7 A ldquoDynamite plotrdquo of SAT scores as a function of gender is one way of misleadingthe reader By using a bar graph the range of scores is ignored Bar graphs start from 0

            21

            gt T lt- with(satacttable(gendereducation))

            gt rownames(T) lt- c(MF)

            gt errorbarstab(Tway=bothylab=Proportion of Education Levelxlab=Level of Education

            + main=Proportion of sample by education level)

            Proportion of sample by education level

            Level of Education

            Pro

            port

            ion

            of E

            duca

            tion

            Leve

            l

            000

            005

            010

            015

            020

            025

            030

            M 0 M 1 M 2 M 3 M 4 M 5

            000

            005

            010

            015

            020

            025

            030

            Figure 8 The proportion of each education level that is Male or Female By using theway=rdquobothrdquo option the percentages and errors are based upon the grand total Alterna-tively way=rdquocolumnsrdquo finds column wise percentages way=rdquorowsrdquo finds rowwise percent-ages The data can be converted to percentages (as shown) or by total count (raw=TRUE)The function invisibly returns the probabilities and standard errors See the help menu foran example of entering the data as a dataframe

            22

            345 Two dimensional displays of means and errors

            Yet another way to display data for different conditions is to use the errorCrosses func-tion For instance the effect of various movies on both ldquoEnergetic Arousalrdquo and ldquoTenseArousalrdquo can be seen in one graph and compared to the same movie manipulations onldquoPositive Affectrdquo and ldquoNegative Affectrdquo Note how Energetic Arousal is increased by threeof the movie manipulations but that Positive Affect increases following the Happy movieonly

            23

            gt op lt- par(mfrow=c(12))

            gt data(affect)

            gt colors lt- c(blackredwhiteblue)

            gt films lt- c(SadHorrorNeutralHappy)

            gt affectstats lt- errorCircles(EA2TA2data=affect[-c(120)]group=Filmlabels=films

            + xlab=Energetic Arousal ylab=Tense Arousalylim=c(1022)xlim=c(820)pch=16

            + cex=2colors=colors main = Movies effect on arousal)gt errorCircles(PA2NA2data=affectstatslabels=filmsxlab=Positive Affect

            + ylab=Negative Affect pch=16cex=2colors=colors main =Movies effect on affect)

            gt op lt- par(mfrow=c(11))

            8 12 16 20

            1012

            1416

            1820

            22

            Movies effect on arousal

            Energetic Arousal

            Tens

            e A

            rous

            al

            SadHorror

            NeutralHappy

            6 8 10 12

            24

            68

            10

            Movies effect on affect

            Positive Affect

            Neg

            ativ

            e A

            ffect

            Sad

            Horror

            NeutralHappy

            Figure 9 The use of the errorCircles function allows for two dimensional displays ofmeans and error bars The first call to errorCircles finds descriptive statistics for theaffect dataframe based upon the grouping variable of Film These data are returned andthen used by the second call which examines the effect of the same grouping variable upondifferent measures The size of the circles represent the relative sample sizes for each groupThe data are from the PMC lab and reported in Smillie et al (2012)

            24

            346 Back to back histograms

            The bibars function summarize the characteristics of two groups (eg males and females)on a second variable (eg age) by drawing back to back histograms (see Figure 10)

            25

            data(bfi)gt png( bibarspng )

            gt with(bfibibars(agegenderylab=Agemain=Age by males and females))

            gt devoff()

            null device

            1

            Figure 10 A bar plot of the age distribution for males and females shows the use ofbibars The data are males and females from 2800 cases collected using the SAPAprocedure and are available as part of the bfi data set

            26

            347 Correlational structure

            There are many ways to display correlations Tabular displays are probably the mostcommon The output from the cor function in core R is a rectangular matrix lowerMat

            will round this to (2) digits and then display as a lower off diagonal matrix lowerCor

            calls cor with use=lsquopairwisersquo method=lsquopearsonrsquo as default values and returns (invisibly)the full correlation matrix and displays the lower off diagonal matrix

            gt lowerCor(satact)

            gendr edctn age ACT SATV SATQ

            gender 100

            education 009 100

            age -002 055 100

            ACT -004 015 011 100

            SATV -002 005 -004 056 100

            SATQ -017 003 -003 059 064 100

            When comparing results from two different groups it is convenient to display them as onematrix with the results from one group below the diagonal and the other group above thediagonal Use lowerUpper to do this

            gt female lt- subset(satactsatact$gender==2)

            gt male lt- subset(satactsatact$gender==1)

            gt lower lt- lowerCor(male[-1])

            edctn age ACT SATV SATQ

            education 100

            age 061 100

            ACT 016 015 100

            SATV 002 -006 061 100

            SATQ 008 004 060 068 100

            gt upper lt- lowerCor(female[-1])

            edctn age ACT SATV SATQ

            education 100

            age 052 100

            ACT 016 008 100

            SATV 007 -003 053 100

            SATQ 003 -009 058 063 100

            gt both lt- lowerUpper(lowerupper)

            gt round(both2)

            education age ACT SATV SATQ

            education NA 052 016 007 003

            age 061 NA 008 -003 -009

            ACT 016 015 NA 053 058

            SATV 002 -006 061 NA 063

            SATQ 008 004 060 068 NA

            It is also possible to compare two matrices by taking their differences and displaying one (be-low the diagonal) and the difference of the second from the first above the diagonal

            27

            gt diffs lt- lowerUpper(lowerupperdiff=TRUE)

            gt round(diffs2)

            education age ACT SATV SATQ

            education NA 009 000 -005 005

            age 061 NA 007 -003 013

            ACT 016 015 NA 008 002

            SATV 002 -006 061 NA 005

            SATQ 008 004 060 068 NA

            348 Heatmap displays of correlational structure

            Perhaps a better way to see the structure in a correlation matrix is to display a heat mapof the correlations This is just a matrix color coded to represent the magnitude of thecorrelation This is useful when considering the number of factors in a data set Considerthe Thurstone data set which has a clear 3 factor solution (Figure 11) or a simulated dataset of 24 variables with a circumplex structure (Figure 12) The color coding representsa ldquoheat maprdquo of the correlations with darker shades of red representing stronger negativeand darker shades of blue stronger positive correlations As an option the value of thecorrelation can be shown

            Yet another way to show structure is to use ldquospiderrdquo plots Particularly if variables areordered in some meaningful way (eg in a circumplex) a spider plot will show this structureeasily This is just a plot of the magnitude of the correlation as a radial line with lengthranging from 0 (for a correlation of -1) to 1 (for a correlation of 1) (See Figure 13)

            35 Testing correlations

            Correlations are wonderful descriptive statistics of the data but some people like to testwhether these correlations differ from zero or differ from each other The cortest func-tion (in the stats package) will test the significance of a single correlation and the rcorr

            function in the Hmisc package will do this for many correlations In the psych packagethe corrtest function reports the correlation (Pearson Spearman or Kendall) betweenall variables in either one or two data frames or matrices as well as the number of obser-vations for each case and the (two-tailed) probability for each correlation Unfortunatelythese probability values have not been corrected for multiple comparisons and so shouldbe taken with a great deal of salt Thus in corrtest and corrp the raw probabilitiesare reported below the diagonal and the probabilities adjusted for multiple comparisonsusing (by default) the Holm correction are reported above the diagonal (Table 1) (See thepadjust function for a discussion of Holm (1979) and other corrections)

            Testing the difference between any two correlations can be done using the rtest functionThe function actually does four different tests (based upon an article by Steiger (1980)

            28

            gt png(corplotpng)gt corPlot(Thurstonenumbers=TRUEupper=FALSEdiag=FALSEmain=9 cognitive variables from Thurstone)

            gt devoff()

            null device

            1

            Figure 11 The structure of correlation matrix can be seen more clearly if the variables aregrouped by factor and then the correlations are shown by color By using the rsquonumbersrsquooption the values are displayed as well By default the complete matrix is shown Settingupper=FALSE and diag=FALSE shows a cleaner figure

            29

            gt png(circplotpng)gt circ lt- simcirc(24)

            gt rcirc lt- cor(circ)

            gt corPlot(rcircmain=24 variables in a circumplex)gt devoff()

            null device

            1

            Figure 12 Using the corPlot function to show the correlations in a circumplex Correlationsare highest near the diagonal diminish to zero further from the diagonal and the increaseagain towards the corners of the matrix Circumplex structures are common in the studyof affect For circumplex structures it is perhaps useful to show the complete matrix

            30

            gt png(spiderpng)gt oplt- par(mfrow=c(22))

            gt spider(y=c(161218)x=124data=rcircfill=TRUEmain=Spider plot of 24 circumplex variables)

            gt op lt- par(mfrow=c(11))

            gt devoff()

            null device

            1

            Figure 13 A spider plot can show circumplex structure very clearly Circumplex structuresare common in the study of affect

            31

            Table 1 The corrtest function reports correlations cell sizes and raw and adjustedprobability values corrp reports the probability values for a correlation matrix Bydefault the adjustment used is that of Holm (1979)gt corrtest(satact)

            Callcorrtest(x = satact)

            Correlation matrix

            gender education age ACT SATV SATQ

            gender 100 009 -002 -004 -002 -017

            education 009 100 055 015 005 003

            age -002 055 100 011 -004 -003

            ACT -004 015 011 100 056 059

            SATV -002 005 -004 056 100 064

            SATQ -017 003 -003 059 064 100

            Sample Size

            gender education age ACT SATV SATQ

            gender 700 700 700 700 700 687

            education 700 700 700 700 700 687

            age 700 700 700 700 700 687

            ACT 700 700 700 700 700 687

            SATV 700 700 700 700 700 687

            SATQ 687 687 687 687 687 687

            Probability values (Entries above the diagonal are adjusted for multiple tests)

            gender education age ACT SATV SATQ

            gender 000 017 100 100 1 0

            education 002 000 000 000 1 1

            age 058 000 000 003 1 1

            ACT 033 000 000 000 0 0

            SATV 062 022 026 000 0 0

            SATQ 000 036 037 000 0 0

            To see confidence intervals of the correlations print with the short=FALSE option

            32

            depending upon the input

            1) For a sample size n find the t and p value for a single correlation as well as the confidenceinterval

            gt rtest(503)

            Correlation tests

            Callrtest(n = 50 r12 = 03)

            Test of significance of a correlation

            t value 218 with probability lt 0034

            and confidence interval 002 053

            2) For sample sizes of n and n2 (n2 = n if not specified) find the z of the difference betweenthe z transformed correlations divided by the standard error of the difference of two zscores

            gt rtest(3046)

            Correlation tests

            Callrtest(n = 30 r12 = 04 r34 = 06)

            Test of difference between two independent correlations

            z value 099 with probability 032

            3) For sample size n and correlations ra= r12 rb= r23 and r13 specified test for thedifference of two dependent correlations (Steiger case A)

            gt rtest(103451)

            Correlation tests

            Call[1] rtest(n = 103 r12 = 04 r23 = 01 r13 = 05 )

            Test of difference between two correlated correlations

            t value -089 with probability lt 037

            4) For sample size n test for the difference between two dependent correlations involvingdifferent variables (Steiger case B)

            gt rtest(103567558) steiger Case B

            Correlation tests

            Callrtest(n = 103 r12 = 05 r34 = 06 r23 = 07 r13 = 05 r14 = 05

            r24 = 08)

            Test of difference between two dependent correlations

            z value -12 with probability 023

            To test whether a matrix of correlations differs from what would be expected if the popu-lation correlations were all zero the function cortest follows Steiger (1980) who pointedout that the sum of the squared elements of a correlation matrix or the Fisher z scoreequivalents is distributed as chi square under the null hypothesis that the values are zero(ie elements of the identity matrix) This is particularly useful for examining whethercorrelations in a single matrix differ from zero or for comparing two matrices Althoughobvious cortest can be used to test whether the satact data matrix produces non-zerocorrelations (it does) This is a much more appropriate test when testing whether a residualmatrix differs from zero

            gt cortest(satact)

            33

            Tests of correlation matrices

            Callcortest(R1 = satact)

            Chi Square value 132542 with df = 15 with probability lt 18e-273

            36 Polychoric tetrachoric polyserial and biserial correlations

            The Pearson correlation of dichotomous data is also known as the φ coefficient If thedata eg ability items are thought to represent an underlying continuous although latentvariable the φ will underestimate the value of the Pearson applied to these latent variablesOne solution to this problem is to use the tetrachoric correlation which is based uponthe assumption of a bivariate normal distribution that has been cut at certain points Thedrawtetra function demonstrates the process (Figure 14) This is also shown in termsof dichotomizing the bivariate normal density function using the drawcor function (Fig-ure 15) A simple generalization of this to the case of the multiple cuts is the polychoric

            correlation

            Other estimated correlations based upon the assumption of bivariate normality with cutpoints include the biserial and polyserial correlation

            If the data are a mix of continuous polytomous and dichotomous variables the mixedcor

            function will calculate the appropriate mixture of Pearson polychoric tetrachoric biserialand polyserial correlations

            The correlation matrix resulting from a number of tetrachoric or polychoric correlationmatrix sometimes will not be positive semi-definite This will sometimes happen if thecorrelation matrix is formed by using pair-wise deletion of cases The corsmooth functionwill adjust the smallest eigen values of the correlation matrix to make them positive rescaleall of them to sum to the number of variables and produce aldquosmoothedrdquocorrelation matrixAn example of this problem is a data set of burt which probably had a typo in the originalcorrelation matrix Smoothing the matrix corrects this problem

            4 Multilevel modeling

            Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular use

            34

            gt drawtetra()

            minus3 minus2 minus1 0 1 2 3

            minus3

            minus2

            minus1

            01

            23

            Y rho = 05phi = 033

            X gt τY gt Τ

            X lt τY gt Τ

            X gt τY lt Τ

            X lt τY lt Τ

            x

            dnor

            m(x

            )

            X gt τ

            τ

            x1

            Y gt Τ

            Τ

            Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values

            35

            gt drawcor(expand=20cuts=c(00))

            xy

            z

            Bivariate density rho = 05

            Figure 15 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values It isfound (laboriously) by optimizing the fit of the bivariate normal for various values of thecorrelation to the observed cell frequencies

            36

            is the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups

            41 Decomposing data into within and between level correlations usingstatsBy

            There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler functionto give some of the basic descriptive statistics for two level models

            This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package

            rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)

            where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means

            42 Generating and displaying multilevel data

            withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a betweengroup correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data

            simmultilevel will generate simulated data with a multilevel structure

            The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary

            function specifying the variable of interest

            37

            Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )

            43 Factor analysis by groups

            Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output

            sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)

            faBy(sbnfactors=5) find the 5 factor solution for each education level

            5 Multiple Regression mediation moderation and set cor-relations

            The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices

            51 Multiple regression from data or correlation matrices

            The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstonecorrelation matrix and find the multiple correlation of the last five variables as a functionof the first 4

            gt setCor(y = 59x=14data=Thurstone)

            Call setCor(y = 59 x = 14 data = Thurstone)

            Multiple Regression from matrix input

            Beta weights

            FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

            Sentences 009 007 025 021 020

            Vocabulary 009 017 009 016 -002

            SentCompletion 002 005 004 021 008

            FirstLetters 058 045 021 008 031

            38

            Multiple R

            FourLetterWords Suffixes LetterSeries Pedigrees

            069 063 050 058

            LetterGroup

            048

            multiple R2

            FourLetterWords Suffixes LetterSeries Pedigrees

            048 040 025 034

            LetterGroup

            023

            Multiple Inflation Factor (VIF) = 1(1-SMC) =

            Sentences Vocabulary SentCompletion FirstLetters

            369 388 300 135

            Unweighted multiple R

            FourLetterWords Suffixes LetterSeries Pedigrees

            059 058 049 058

            LetterGroup

            045

            Unweighted multiple R2

            FourLetterWords Suffixes LetterSeries Pedigrees

            034 034 024 033

            LetterGroup

            020

            Various estimates of between set correlations

            Squared Canonical Correlations

            [1] 06280 01478 00076 00049

            Average squared canonical correlation = 02

            Cohens Set Correlation R2 = 069

            Unweighted correlation between the two sets = 073

            By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed

            gt sc lt- setCor(y = 59x=34data=Thurstonez=12)

            Call setCor(y = 59 x = 34 data = Thurstone z = 12)

            Multiple Regression from matrix input

            Beta weights

            FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

            SentCompletion 002 005 004 021 008

            FirstLetters 058 045 021 008 031

            Multiple R

            FourLetterWords Suffixes LetterSeries Pedigrees

            058 046 021 018

            LetterGroup

            030

            39

            multiple R2

            FourLetterWords Suffixes LetterSeries Pedigrees

            0331 0210 0043 0032

            LetterGroup

            0092

            Multiple Inflation Factor (VIF) = 1(1-SMC) =

            SentCompletion FirstLetters

            102 102

            Unweighted multiple R

            FourLetterWords Suffixes LetterSeries Pedigrees

            044 035 017 014

            LetterGroup

            026

            Unweighted multiple R2

            FourLetterWords Suffixes LetterSeries Pedigrees

            019 012 003 002

            LetterGroup

            007

            Various estimates of between set correlations

            Squared Canonical Correlations

            [1] 0405 0023

            Average squared canonical correlation = 021

            Cohens Set Correlation R2 = 042

            Unweighted correlation between the two sets = 048

            gt round(sc$residual2)

            FourLetterWords Suffixes LetterSeries Pedigrees

            FourLetterWords 052 011 009 006

            Suffixes 011 060 -001 001

            LetterSeries 009 -001 075 028

            Pedigrees 006 001 028 066

            LetterGroup 013 003 037 020

            LetterGroup

            FourLetterWords 013

            Suffixes 003

            LetterSeries 037

            Pedigrees 020

            LetterGroup 077

            52 Mediation and Moderation analysis

            Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping

            40

            Consider the example from Preacher and Hayes (2004) as analyzed using the mediate

            function and the subsequent graphic from mediatediagram The data are found in theexample for mediate

            Call mediate(y = SATIS x = THERAPY m = ATTRIB data = sobel)

            The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB

            Total Direct effect(c) of THERAPY on SATIS = 076 SE = 031 t direct = 25 with probability = 0019

            Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t direct = 135 with probability = 019

            Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033

            Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 004 Upper CI = 069

            R2 of model = 031

            To see the longer output specify short = FALSE in the print statement

            Full output

            Total effect estimates (c)

            SATIS se t Prob

            THERAPY 076 031 25 00186

            Direct effect estimates (c)SATIS se t Prob

            THERAPY 043 032 135 0190

            ATTRIB 040 018 223 0034

            a effect estimates

            THERAPY se t Prob

            ATTRIB 082 03 274 00106

            b effect estimates

            SATIS se t Prob

            ATTRIB 04 018 223 0034

            ab effect estimates

            SATIS boot sd lower upper

            THERAPY 033 032 017 004 069

            bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables

            setCor(y = c( SATV SATQ) x = c(education age ) data = satact std=TRUE)

            bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap

            mediate(y = c( SATV ) x = c(education age ) m= ACT data =satactstd=TRUEniter=50)

            bull mediate will take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It then tests the mediation effect using a bootstrap The particular example is for demonstration purposes only and shows neithermoderation nor mediation The number of iterations for the boot strap was set to 50

            41

            gt mediatediagram(preacher)

            Mediation model

            THERAPY SATIS

            ATTRIB

            082

            c = 076

            c = 043

            04

            Figure 16 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram

            42

            gt preacher lt- setCor(1c(23)sobelstd=FALSE)

            gt setCordiagram(preacher)

            Regression Models

            THERAPY

            ATTRIB

            SATIS

            043

            04

            021

            Figure 17 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure

            43

            for speed The default number of boot straps is 5000

            53 Set Correlation

            An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor

            function Set correlation is

            R2 = 1minusn

            prodi=1

            (1minusλi)

            where λi is the ith eigen value of the eigen value decomposition of the matrix

            R = Rminus1xx RxyRminus1

            xx Rminus1xy

            Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate

            setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data

            Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things tonotice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied

            For this example the analysis is done on the correlation matrix rather than the rawdata

            gt C lt- cov(satactuse=pairwise)

            gt model1 lt- lm(ACT~ gender + education + age data=satact)

            gt summary(model1)

            Call

            lm(formula = ACT ~ gender + education + age data = satact)

            Residuals

            44

            Call mediate(y = c(SATQ) x = c(ACT) m = education data = satact

            mod = gender niter = 50 std = TRUE)

            The DV (Y) was SATQ The IV (X) was ACT gender ACTXgndr The mediating variable(s) = education

            Total Direct effect(c) of ACT on SATQ = 058 SE = 003 t direct = 1925 with probability = 0

            Direct effect (c) of ACT on SATQ removing education = 059 SE = 003 t direct = 1926 with probability = 0

            Indirect effect (ab) of ACT on SATQ through education = -001

            Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -002 Upper CI = 0

            Total Direct effect(c) of gender on SATQ = -014 SE = 003 t direct = -478 with probability = 21e-06

            Direct effect (c) of gender on NA removing education = -014 SE = 003 t direct = -463 with probability = 44e-06

            Indirect effect (ab) of gender on SATQ through education = 0

            Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -001 Upper CI = 0

            Total Direct effect(c) of ACTXgndr on SATQ = 0 SE = 003 t direct = 002 with probability = 099

            Direct effect (c) of ACTXgndr on NA removing education = 0 SE = 003 t direct = 001 with probability = 099

            Indirect effect (ab) of ACTXgndr on SATQ through education = 0

            Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = 0 Upper CI = 0

            R2 of model = 037

            To see the longer output specify short = FALSE in the print statement

            Full output

            Total effect estimates (c)

            SATQ se t Prob

            ACT 058 003 1925 000e+00

            gender -014 003 -478 210e-06

            ACTXgndr 000 003 002 985e-01

            Direct effect estimates (c)SATQ se t Prob

            ACT 059 003 1926 000e+00

            gender -014 003 -463 437e-06

            ACTXgndr 000 003 001 992e-01

            a effect estimates

            education se t Prob

            ACT 016 004 422 277e-05

            gender 009 004 250 128e-02

            ACTXgndr -001 004 -015 883e-01

            b effect estimates

            SATQ se t Prob

            education -004 003 -145 0147

            ab effect estimates

            SATQ boot sd lower upper

            ACT -001 -001 001 0 0

            gender 000 000 000 0 0

            ACTXgndr 000 000 000 0 0

            Moderation model

            ACT

            gender

            ACTXgndr

            SATQ

            education016 c = 058

            c = 059

            009 c = minus014

            c = minus014

            minus001 c = 0

            c = 0

            minus004

            minus004

            minus007

            002

            Figure 18 Moderated multiple regression requires the raw data

            45

            Min 1Q Median 3Q Max

            -252458 -32133 07769 35921 92630

            Coefficients

            Estimate Std Error t value Pr(gt|t|)

            (Intercept) 2741706 082140 33378 lt 2e-16

            gender -048606 037984 -1280 020110

            education 047890 015235 3143 000174

            age 001623 002278 0712 047650

            ---

            Signif codes 0 0001 001 005 01 1

            Residual standard error 4768 on 696 degrees of freedom

            Multiple R-squared 00272 Adjusted R-squared 002301

            F-statistic 6487 on 3 and 696 DF p-value 00002476

            Compare this with the output from setCor

            gt compare with sector

            gt setCor(c(46)c(13)C nobs=700)

            Call setCor(y = c(46) x = c(13) data = C nobs = 700)

            Multiple Regression from matrix input

            Beta weights

            ACT SATV SATQ

            gender -005 -003 -018

            education 014 010 010

            age 003 -010 -009

            Multiple R

            ACT SATV SATQ

            016 010 019

            multiple R2

            ACT SATV SATQ

            00272 00096 00359

            Multiple Inflation Factor (VIF) = 1(1-SMC) =

            gender education age

            101 145 144

            Unweighted multiple R

            ACT SATV SATQ

            015 005 011

            Unweighted multiple R2

            ACT SATV SATQ

            002 000 001

            SE of Beta weights

            ACT SATV SATQ

            gender 018 429 434

            education 022 513 518

            age 022 511 516

            t of Beta Weights

            ACT SATV SATQ

            gender -027 -001 -004

            education 065 002 002

            46

            age 015 -002 -002

            Probability of t lt

            ACT SATV SATQ

            gender 079 099 097

            education 051 098 098

            age 088 098 099

            Shrunken R2

            ACT SATV SATQ

            00230 00054 00317

            Standard Error of R2

            ACT SATV SATQ

            00120 00073 00137

            F

            ACT SATV SATQ

            649 226 863

            Probability of F lt

            ACT SATV SATQ

            248e-04 808e-02 124e-05

            degrees of freedom of regression

            [1] 3 696

            Various estimates of between set correlations

            Squared Canonical Correlations

            [1] 0050 0033 0008

            Chisq of canonical correlations

            [1] 358 231 56

            Average squared canonical correlation = 003

            Cohens Set Correlation R2 = 009

            Shrunken Set Correlation R2 = 008

            F and df of Cohens Set Correlation 726 9 168186

            Unweighted correlation between the two sets = 001

            Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

            6 Converting output to APA style tables using LATEX

            Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenient

            47

            LATEXoutput and finally df2latex converts a generic data frame to LATEX

            An example of converting the output from fa to LATEXappears in Table 2

            Table 2 fa2latexA factor analysis table from the psych package in R

            Variable MR1 MR2 MR3 h2 u2 com

            Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

            SS loadings 264 186 15

            MR1 100 059 054MR2 059 100 052MR3 054 052 100

            48

            7 Miscellaneous functions

            A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

            blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

            df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

            scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

            cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

            cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

            dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

            fisherz Convert a correlation to the corresponding Fisher z score

            geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

            ICC and cohenkappa are typically used to find the reliability for raters

            headtail combines the head and tail functions to show the first and last lines of a dataset or output

            topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

            mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

            prep finds the probability of replication for an F t or r and estimate effect size

            partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

            rangeCorrection will correct correlations for restriction of range

            reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

            49

            superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

            8 Data sets

            A number of data sets for demonstrating psychometric techniques are included in thepsych package These include six data sets showing a hierarchical factor structure (fivecognitive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2and one from health psychology Reise) One of these (Thurstone) is used as an examplein the sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) or 13 personality inventory scores (epibfi) and14 multiple choice iq items (iqitems) The vegetables example has paired comparisonpreferences for 9 vegetables This is an example of Thurstonian scaling used by Guilford(1954) and Nunnally (1967) Other data sets include cubits peas and heights fromGalton

            Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

            bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

            satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

            epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

            50

            iq 14 multiple choice ability items were included as part of the Synthetic Aperture Person-ality Assessment (SAPA) web based personality assessment project The data from1000 subjects are included here as a demonstration set for scoring multiple choiceinventories and doing basic item statistics

            galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to ntroduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

            Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

            miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

            9 Development version and a users guide

            The most recent development version is available as a source file at the repository main-tained at httppersonality-projectorgr That version will have removed the mostrecently discovered bugs (but perhaps introduced other yet to be discovered ones) Todownload that version go to the repository httppersonality-projectorgrsrc

            contrib and wander around For a Mac this version can be installed directly using theldquoother repositoryrdquo option in the package installer For a PC the zip file for the most recentrelease has been created using the win-builder facility at CRAN The development releasefor the Mac is usually several weeks ahead of the PC development version

            Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httppersonality-projectorgrpsych_manualpdf

            News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

            gt news(Version gt 170package=psych)

            51

            10 Psychometric Theory

            The psych package has been developed to help psychologists do basic research Many ofthe functions were developed to supplement a book (httppersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep)More information about the use of some of the functions may be found in the book

            For more extensive discussion of the use of psych in particular and R in general consulthttppersonality-projectorgrrguidehtml A short guide to R

            11 SessionInfo

            This document was prepared using the following settings

            gt sessionInfo()

            R Under development (unstable) (2017-03-05 r72309)

            Platform x86_64-apple-darwin1340 (64-bit)

            Running under macOS Sierra 10124

            Matrix products default

            BLAS LibraryFrameworksRframeworkVersions34ResourcesliblibRblas0dylib

            LAPACK LibraryFrameworksRframeworkVersions34ResourcesliblibRlapackdylib

            locale

            [1] C

            attached base packages

            [1] stats graphics grDevices utils datasets methods base

            other attached packages

            [1] psych_17421

            loaded via a namespace (and not attached)

            [1] compiler_340 parallel_340 tools_340 foreign_08-67

            [5] KernSmooth_223-15 nlme_31-131 mnormt_15-4 grid_340

            [9] lattice_020-34

            52

            References

            Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

            Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

            Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

            Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

            Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

            Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

            Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

            Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

            Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

            Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

            Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

            Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

            Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

            Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

            Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

            53

            Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

            Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

            Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

            Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

            Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

            Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

            Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

            Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

            Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

            Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

            MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

            Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

            McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

            Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

            Nunnally J C (1967) Psychometric theory McGraw-Hill New York

            54

            Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

            3rd edition

            Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

            Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

            Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

            Revelle W (2015) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 158

            Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

            Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

            Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

            Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

            Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook ofIndividual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

            Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

            Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

            Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

            Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bang

            55

            for the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

            Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

            Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

            Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

            Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

            Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

            Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

            Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

            Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

            Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

            Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

            Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

            56

            Index

            affect 14 24alpha 5 6

            Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26bifactor 6biserial 13 34blockrandom 49burt 34

            char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49circular statistics 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49ctv 7cubits 50

            densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49diagram 8drawcor 34drawtetra 34dummycode 13

            dynamite plot 19

            edit 3epibfi 50error bars 19errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23

            fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factor analysis 6factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49

            galton 51generalized least squares 6geometricmean 49GPArotation 7guttman 6

            harmonicmean 49head 49headtail 49heights 50hetdiagram 7Hmisc 28Holzinger 50

            57

            ICC 6 49iclust 6iclustdiagram 7Index 49introduction to psychometric theory with ap-

            plications in R 7iqitems 50irtfa 6 47irt2latex 47 49

            KnitR 47

            lavaan 38library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27lowess 14

            makekeys 14MAP 6mardia 49maximum likelihood 6mediate 4 41 42mediatediagram 41minimum residual 6mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevel 37multilevelreliability 6multiple regression 38

            nfactors 6nlme 37

            omega 6 7outlier 3 11 12

            padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7principal axis 6psych 3 5ndash8 28 47 49ndash52

            R functionaffect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50

            58

            densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13edit 3epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49head 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47

            irt2latex 47 49library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7psych 51psych package

            affect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26

            59

            biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49corsmooth 34cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7

            fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47irt2latex 47 49lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12prep 49pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34

            60

            polyserial 34principal 5ndash7psych 51rtest 28rangeCorrection 49readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

            rtest 28

            rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50table 19tail 49tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

            R package

            61

            ctv 7GPArotation 7Hmisc 28KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash8 28 47 49ndash52Rgraphviz 8sem 7 50stats 28Sweave 47xtable 47

            rtest 28rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49Rgraphviz 8

            SAPA 26 50 51satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43sem 7 50set correlation 44setcor 49setCor 4 38 41 44 46 47simmultilevel 37

            spider 13stars 14stats 28StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50Sweave 47

            table 19tail 49tetrachoric 6 34Thurstone 28 38 50Thurstone33 50topBottom 49

            vegetables 50 51violinBy 14 18vss 5 6

            weighted least squares 6withinBetween 37

            xtable 47

            62

            • Jump starting the psych packagendasha guide for the impatient
            • Psychometric functions are summarized in the second vignette
            • Overview of this and related documents
            • Getting started
            • Basic data analysis
              • Getting the data by using readfile
              • Data input from the clipboard
              • Basic descriptive statistics
                • Outlier detection using outlier
                • Basic data cleaning using scrub
                • Recoding categorical variables into dummy coded variables
                  • Simple descriptive graphics
                    • Scatter Plot Matrices
                    • Density or violin plots
                    • Means and error bars
                    • Error bars for tabular data
                    • Two dimensional displays of means and errors
                    • Back to back histograms
                    • Correlational structure
                    • Heatmap displays of correlational structure
                      • Testing correlations
                      • Polychoric tetrachoric polyserial and biserial correlations
                        • Multilevel modeling
                          • Decomposing data into within and between level correlations using statsBy
                          • Generating and displaying multilevel data
                          • Factor analysis by groups
                            • Multiple Regression mediation moderation and set correlations
                              • Multiple regression from data or correlation matrices
                              • Mediation and Moderation analysis
                              • Set Correlation
                                • Converting output to APA style tables using LaTeX
                                • Miscellaneous functions
                                • Data sets
                                • Development version and a users guide
                                • Psychometric Theory
                                • SessionInfo

              time and items mlPlot will graph items over for each subject mlArrange converts widedata frames to long data frames suitable for multilevel modeling

              Graphical displays include Scatter Plot Matrix (SPLOM) plots using pairspanels cor-relation ldquoheat mapsrdquo (corPlot) factor cluster and structural diagrams using fadiagramiclustdiagram structurediagram and hetdiagram as well as item response charac-teristics and item and test information characteristic curves plotirt and plotpoly

              This vignette is meant to give an overview of the psych package That is it is meantto give a summary of the main functions in the psych package with examples of howthey are used for data description dimension reduction and scale construction The ex-tended user manual at psych_manualpdf includes examples of graphic output and moreextensive demonstrations than are found in the help menus (Also available at http

              personality-projectorgrpsych_manualpdf) The vignette psych for sem atpsych_for_sempdf discusses how to use psych as a front end to the sem package of JohnFox (Fox et al 2012) (The vignette is also available at httppersonality-project

              orgrbookpsych_for_sempdf)

              For a step by step tutorial in the use of the psych package and the base functions inR for basic personality research see the guide for using R for personality research athttppersonalitytheoryorgrrshorthtml For an introduction to psychometrictheory with applications in R see the draft chapters at httppersonality-project

              orgrbook)

              2 Getting started

              Some of the functions described in the Overview Vignette require other packages This isnot the case for the functions listed in this Introduction Particularly useful for rotatingthe results of factor analyses (from eg fa factorminres factorpa factorwlsor principal) or hierarchical factor models using omega or schmid is the GPArotationpackage These and other useful packages may be installed by first installing and thenusing the task views (ctv) package to install the ldquoPsychometricsrdquo task view but doing itthis way is not necessary

              installpackages(ctv)

              library(ctv)

              taskviews(Psychometrics)

              The ldquoPsychometricsrdquo task view will install a large number of useful packages To installthe bare minimum for the examples in this vignette it is necessary to install just 3 pack-ages

              7

              installpackages(list(c(GPArotationmnormt)

              Because of the difficulty of installing the package Rgraphviz alternative graphics have beendeveloped and are available as diagram functions If Rgraphviz is available some functionswill take advantage of it An alternative is to useldquodotrdquooutput of commands for any externalgraphics package that uses the dot language

              3 Basic data analysis

              A number of psych functions facilitate the entry of data and finding basic descriptivestatistics

              Remember to run any of the psych functions it is necessary to make the package activeby using the library command

              library(psych)

              The other packages once installed will be called automatically by psych

              It is possible to automatically load psych and other functions by creating and then savinga ldquoFirstrdquo function eg

              First lt- function(x) library(psych)

              31 Getting the data by using readfile

              Although many find copying the data to the clipboard and then using the readclipboardfunctions (see below) a helpful alternative is to read the data in directly This can be doneusing the readfile function which calls filechoose to find the file and then based uponthe suffix of the file chooses the appropriate way to read it For files with suffixes of txttext r rds rda csv xpt or sav the file will be read correctly

              mydata lt- readfile()

              If the file contains Fixed Width Format (fwf) data the column information can be specifiedwith the widths command

              mydata lt- readfile(widths = c(4rep(135)) will read in a file without a header row and 36 fields the first of which is 4 colums the rest of which are 1 column each

              If the file is a RData file (with suffix of RData Rda rda Rdata or rdata) the objectwill be loaded Depending what was stored this might be several objects If the file is asav file from SPSS it will be read with the most useful default options (converting the fileto a dataframe and converting character fields to numeric) Alternative options may bespecified If it is an export file from SAS (xpt or XPT) it will be read csv files (comma

              8

              separated files) normal txt or text files data or dat files will be read as well These areassumed to have a header row of variable labels (header=TRUE) If the data do not havea header row you must specify readfile(header=FALSE)

              To read SPSS files and to keep the value labels specify usevaluelabels=TRUE

              myspss lt- readfile(usevaluelabels=TRUE) this will keep the value labels for sav files

              32 Data input from the clipboard

              There are of course many ways to enter data into R Reading from a local file usingreadtable is perhaps the most preferred However many users will enter their datain a text editor or spreadsheet program and then want to copy and paste into R Thismay be done by using readtable and specifying the input file as ldquoclipboardrdquo (PCs) orldquopipe(pbpaste)rdquo (Macs) Alternatively the readclipboard set of functions are perhapsmore user friendly

              readclipboard is the base function for reading data from the clipboard

              readclipboardcsv for reading text that is comma delimited

              readclipboardtab for reading text that is tab delimited (eg copied directly from anExcel file)

              readclipboardlower for reading input of a lower triangular matrix with or without adiagonal The resulting object is a square matrix

              readclipboardupper for reading input of an upper triangular matrix

              readclipboardfwf for reading in fixed width fields (some very old data sets)

              For example given a data set copied to the clipboard from a spreadsheet just enter thecommand

              mydata lt- readclipboard()

              This will work if every data field has a value and even missing data are given some values(eg NA or -999) If the data were entered in a spreadsheet and the missing valueswere just empty cells then the data should be read in as a tab delimited or by using thereadclipboardtab function

              gt mydata lt- readclipboard(sep=t) define the tab option or

              gt mytabdata lt- readclipboardtab() just use the alternative function

              For the case of data in fixed width fields (some old data sets tend to have this format)copy to the clipboard and then specify the width of each field (in the example below the

              9

              first variable is 5 columns the second is 2 columns the next 5 are 1 column the last 4 are3 columns)

              gt mydata lt- readclipboardfwf(widths=c(52rep(15)rep(34))

              33 Basic descriptive statistics

              Once the data are read in then describe or describeBy will provide basic descriptivestatistics arranged in a data frame format Consider the data set satact which in-cludes data from 700 web based participants on 3 demographic variables and 3 abilitymeasures

              describe reports means standard deviations medians min max range skew kurtosisand standard errors for integer or real data Non-numeric data although the statisticsare meaningless will be treated as if numeric (based upon the categorical coding ofthe data) and will be flagged with an

              describeBy reports descriptive statistics broken down by some categorizing variable (eggender age etc)

              gt library(psych)

              gt data(satact)

              gt describe(satact) basic descriptive statistics

              vars n mean sd median trimmed mad min max range skew

              gender 1 700 165 048 2 168 000 1 2 1 -061

              education 2 700 316 143 3 331 148 0 5 5 -068

              age 3 700 2559 950 22 2386 593 13 65 52 164

              ACT 4 700 2855 482 29 2884 445 3 36 33 -066

              SATV 5 700 61223 11290 620 61945 11861 200 800 600 -064

              SATQ 6 687 61022 11564 620 61725 11861 200 800 600 -059

              kurtosis se

              gender -162 002

              education -007 005

              age 242 036

              ACT 053 018

              SATV 033 427

              SATQ -002 441

              These data may then be analyzed by groups defined in a logical statement or by some othervariable Eg break down the descriptive data for males or females These descriptivedata can also be seen graphically using the errorbarsby function (Figure 6) By settingskew=FALSE and ranges=FALSE the output is limited to the most basic statistics

              gt basic descriptive statistics by a grouping variable

              gt describeBy(satactsatact$genderskew=FALSEranges=FALSE)

              Descriptive statistics by group

              group 1

              vars n mean sd se

              gender 1 247 100 000 000

              10

              education 2 247 300 154 010

              age 3 247 2586 974 062

              ACT 4 247 2879 506 032

              SATV 5 247 61511 11416 726

              SATQ 6 245 63587 11602 741

              ------------------------------------------------------------

              group 2

              vars n mean sd se

              gender 1 453 200 000 000

              education 2 453 326 135 006

              age 3 453 2545 937 044

              ACT 4 453 2842 469 022

              SATV 5 453 61066 11231 528

              SATQ 6 442 59600 11307 538

              The output from the describeBy function can be forced into a matrix form for easy analysisby other programs In addition describeBy can group by several grouping variables at thesame time

              gt samat lt- describeBy(satactlist(satact$gendersatact$education)

              + skew=FALSEranges=FALSEmat=TRUE)

              gt headTail(samat)

              item group1 group2 vars n mean sd se

              gender1 1 1 0 1 27 1 0 0

              gender2 2 2 0 1 30 2 0 0

              gender3 3 1 1 1 20 1 0 0

              gender4 4 2 1 1 25 2 0 0

              ltNAgt ltNAgt ltNAgt

              SATQ9 69 1 4 6 51 6359 10412 1458

              SATQ10 70 2 4 6 86 59759 10624 1146

              SATQ11 71 1 5 6 46 65783 8961 1321

              SATQ12 72 2 5 6 93 60672 10555 1095

              331 Outlier detection using outlier

              One way to detect unusual data is to consider how far each data point is from the mul-tivariate centroid of the data That is find the squared Mahalanobis distance for eachdata point and then compare these to the expected values of χ2 This produces a Q-Q(quantle-quantile) plot with the n most extreme data points labeled (Figure 1) The outliervalues are in the vector d2

              332 Basic data cleaning using scrub

              If after describing the data it is apparent that there were data entry errors that need tobe globally replaced with NA or only certain ranges of data will be analyzed the data canbe ldquocleanedrdquo using the scrub function

              Consider a data set of 10 rows of 12 columns with values from 1 - 120 All values of columns

              11

              gt png( outlierpng )

              gt d2 lt- outlier(satactcex=8)

              gt devoff()

              null device

              1

              Figure 1 Using the outlier function to graphically show outliers The y axis is theMahalanobis D2 the X axis is the distribution of χ2 for the same number of degrees offreedom The outliers detected here may be shown graphically using pairspanels (see2 and may be found by sorting d2

              12

              3 - 5 that are less than 30 40 or 50 respectively or greater than 70 in any of the threecolumns will be replaced with NA In addition any value exactly equal to 45 will be setto NA (max and isvalue are set to one value here but they could be a different value forevery column)

              gt x lt- matrix(1120ncol=10byrow=TRUE)

              gt colnames(x) lt- paste(V110sep=)gt newx lt- scrub(x35min=c(304050)max=70isvalue=45newvalue=NA)

              gt newx

              V1 V2 V3 V4 V5 V6 V7 V8 V9 V10

              [1] 1 2 NA NA NA 6 7 8 9 10

              [2] 11 12 NA NA NA 16 17 18 19 20

              [3] 21 22 NA NA NA 26 27 28 29 30

              [4] 31 32 33 NA NA 36 37 38 39 40

              [5] 41 42 43 44 NA 46 47 48 49 50

              [6] 51 52 53 54 55 56 57 58 59 60

              [7] 61 62 63 64 65 66 67 68 69 70

              [8] 71 72 NA NA NA 76 77 78 79 80

              [9] 81 82 NA NA NA 86 87 88 89 90

              [10] 91 92 NA NA NA 96 97 98 99 100

              [11] 101 102 NA NA NA 106 107 108 109 110

              [12] 111 112 NA NA NA 116 117 118 119 120

              Note that the number of subjects for those columns has decreased and the minimums havegone up but the maximums down Data cleaning and examination for outliers should be aroutine part of any data analysis

              333 Recoding categorical variables into dummy coded variables

              Sometimes categorical variables (eg college major occupation ethnicity) are to be ana-lyzed using correlation or regression To do this one can form ldquodummy codesrdquo which aremerely binary variables for each category This may be done using dummycode Subse-quent analyses using these dummy coded variables may be using biserial or point biserial(regular Pearson r) to show effect sizes and may be plotted in eg spider plots

              Alternatively sometimes data were coded originally as categorical (MaleFemale HighSchool some College in college etc) and you want to convert these columns of data tonumeric This is done by char2numeric

              34 Simple descriptive graphics

              Graphic descriptions of data are very helpful both for understanding the data as well ascommunicating important results Scatter Plot Matrices (SPLOMS) using the pairspanelsfunction are useful ways to look for strange effects involving outliers and non-linearitieserrorbarsby will show group means with 95 confidence boundaries By default er-rorbarsby and errorbars will show ldquocats eyesrdquo to graphically show the confidence

              13

              limits (Figure 6) This may be turned off by specifying eyes=FALSE densityBy or vio-

              linBy may be used to show the distribution of the data in ldquoviolinrdquo plots (Figure 5) (Theseare sometimes called ldquolava-lamprdquo plots)

              341 Scatter Plot Matrices

              Scatter Plot Matrices (SPLOMS) are very useful for describing the data The pairspanelsfunction adapted from the help menu for the pairs function produces xy scatter plots ofeach pair of variables below the diagonal shows the histogram of each variable on thediagonal and shows the lowess locally fit regression line as well An ellipse around themean with the axis length reflecting one standard deviation of the x and y variables is alsodrawn The x axis in each scatter plot represents the column variable the y axis the rowvariable (Figure 2) When plotting many subjects it is both faster and cleaner to set theplot character (pch) to be rsquorsquo (See Figure 2 for an example)

              pairspanels will show the pairwise scatter plots of all the variables as well as his-tograms locally smoothed regressions and the Pearson correlation When plottingmany data points (as in the case of the satact data it is possible to specify that theplot character is a period to get a somewhat cleaner graphic However in this figureto show the outliers we use colors and a larger plot character If we want to indicatersquosignificancersquo of the correlations by the conventional use of rsquomagic astricksrsquo we can setthe stars=TRUE option

              Another example of pairspanels is to show differences between experimental groupsConsider the data in the affect data set The scores reflect post test scores on positiveand negative affect and energetic and tense arousal The colors show the results for fourmovie conditions depressing frightening movie neutral and a comedy

              Yet another demonstration of pairspanels is useful when you have many subjects andwant to show the density of the distributions To do this we will use the makekeys

              and scoreItems functions (discussed in the second vignette) to create scales measuringEnergetic Arousal Tense Arousal Positive Affect and Negative Affect (see the msq helpfile) We then show a pairspanels scatter plot matrix where we smooth the data pointsand show the density of the distribution by color

              342 Density or violin plots

              Graphical presentation of data may be shown using box plots to show the median and 25thand 75th percentiles A powerful alternative is to show the density distribution using theviolinBy function (Figure 5)

              14

              gt png( pairspanelspng )

              gt satd2 lt- dataframe(satactd2) combine the d2 statistics from before with the satact dataframe

              gt pairspanels(satd2bg=c(yellowblue)[(d2 gt 25)+1]pch=21stars=TRUE)

              gt devoff()

              null device

              1

              Figure 2 Using the pairspanels function to graphically show relationships The x axisin each scatter plot represents the column variable the y axis the row variable Note theextreme outlier for the ACT If the plot character were set to a period (pch=rsquorsquo) it wouldmake a cleaner graphic but in to show the outliers in color we use the plot characters 21and 22

              15

              gt png(affectpng)gt pairspanels(affect[1417]bg=c(redblackwhiteblue)[affect$Film]pch=21

              + main=Affect varies by movies )

              gt devoff()

              null device

              1

              Figure 3 Using the pairspanels function to graphically show relationships The x axis ineach scatter plot represents the column variable the y axis the row variable The coloringrepresent four different movie conditions

              16

              gt keys lt- makekeys(msq[175]list(

              + EA = c(active energetic vigorous wakeful wideawake fullofpep

              + lively -sleepy -tired -drowsy)

              + TA =c(intense jittery fearful tense clutchedup -quiet -still

              + -placid -calm -atrest)

              + PA =c(active excited strong inspired determined attentive

              + interested enthusiastic proud alert)

              + NAf =c(jittery nervous scared afraid guilty ashamed distressed

              + upset hostile irritable )) )

              gt scores lt- scoreItems(keysmsq[175])

              gt png(msqpng)gt pairspanels(scores$scoressmoother=TRUE

              + main =Density distributions of four measures of affect )

              gt devoff()

              null device

              1

              Figure 4 Using the pairspanels function to graphically show relationships The x axis ineach scatter plot represents the column variable the y axis the row variable The variablesare four measures of motivational state for 3896 participants Each scale is the averagescore of 10 items measuring motivational state Compare this a plot with smoother set toFALSE

              17

              gt data(satact)

              gt violinBy(satact[56]satact$gendergrpname=c(M F)main=Density Plot by gender for SAT V and Q)

              Density Plot by gender for SAT V and Q

              Obs

              erve

              d

              SATV M SATV F SATQ M SATQ F

              200

              300

              400

              500

              600

              700

              800

              Figure 5 Using the violinBy function to show the distribution of SAT V and Q for malesand females The plot shows the medians and 25th and 75th percentiles as well as theentire range and the density distribution

              18

              343 Means and error bars

              Additional descriptive graphics include the ability to draw error bars on sets of data aswell as to draw error bars in both the x and y directions for paired data These are thefunctions errorbars errorbarsby errorbarstab and errorcrosses

              errorbars show the 95 confidence intervals for each variable in a data frame or ma-trix These errors are based upon normal theory and the standard errors of the meanAlternative options include +- one standard deviation or 1 standard error If thedata are repeated measures the error bars will be reflect the between variable cor-relations By default the confidence intervals are displayed using a ldquocats eyesrdquo plotwhich emphasizes the distribution of confidence within the confidence interval

              errorbarsby does the same but grouping the data by some condition

              errorbarstab draws bar graphs from tabular data with error bars based upon thestandard error of proportion (σp =

              radicpqN)

              errorcrosses draw the confidence intervals for an x set and a y set of the same size

              The use of the errorbarsby function allows for graphic comparisons of different groups(see Figure 6) Five personality measures are shown as a function of high versus low scoreson a ldquolierdquo scale People with higher lie scores tend to report being more agreeable consci-entious and less neurotic than people with lower lie scores The error bars are based uponnormal theory and thus are symmetric rather than reflect any skewing in the data

              Although not recommended it is possible to use the errorbars function to draw bargraphs with associated error bars (This kind of dynamite plot (Figure 8) can be verymisleading in that the scale is arbitrary Go to a discussion of the problems in presentingdata this way at httpemdbolkerwikidotcomblogdynamite In the example shownnote that the graph starts at 0 although is out of the range This is a function of usingbars which always are assumed to start at zero Consider other ways of showing yourdata

              344 Error bars for tabular data

              However it is sometimes useful to show error bars for tabular data either found by thetable function or just directly input These may be found using the errorbarstab

              function

              19

              gt data(epibfi)

              gt errorbarsby(epibfi[610]epibfi$epilielt4)

              095 confidence limits

              Independent Variable

              Dep

              ende

              nt V

              aria

              ble

              bfagree bfcon bfext bfneur bfopen

              050

              100

              150

              Figure 6 Using the errorbarsby function shows that self reported personality scales onthe Big Five Inventory vary as a function of the Lie scale on the EPI The ldquocats eyesrdquo showthe distribution of the confidence

              20

              gt errorbarsby(satact[56]satact$genderbars=TRUE

              + labels=c(MaleFemale)ylab=SAT scorexlab=)

              Male Female

              095 confidence limits

              SAT

              sco

              re

              200

              300

              400

              500

              600

              700

              800

              200

              300

              400

              500

              600

              700

              800

              Figure 7 A ldquoDynamite plotrdquo of SAT scores as a function of gender is one way of misleadingthe reader By using a bar graph the range of scores is ignored Bar graphs start from 0

              21

              gt T lt- with(satacttable(gendereducation))

              gt rownames(T) lt- c(MF)

              gt errorbarstab(Tway=bothylab=Proportion of Education Levelxlab=Level of Education

              + main=Proportion of sample by education level)

              Proportion of sample by education level

              Level of Education

              Pro

              port

              ion

              of E

              duca

              tion

              Leve

              l

              000

              005

              010

              015

              020

              025

              030

              M 0 M 1 M 2 M 3 M 4 M 5

              000

              005

              010

              015

              020

              025

              030

              Figure 8 The proportion of each education level that is Male or Female By using theway=rdquobothrdquo option the percentages and errors are based upon the grand total Alterna-tively way=rdquocolumnsrdquo finds column wise percentages way=rdquorowsrdquo finds rowwise percent-ages The data can be converted to percentages (as shown) or by total count (raw=TRUE)The function invisibly returns the probabilities and standard errors See the help menu foran example of entering the data as a dataframe

              22

              345 Two dimensional displays of means and errors

              Yet another way to display data for different conditions is to use the errorCrosses func-tion For instance the effect of various movies on both ldquoEnergetic Arousalrdquo and ldquoTenseArousalrdquo can be seen in one graph and compared to the same movie manipulations onldquoPositive Affectrdquo and ldquoNegative Affectrdquo Note how Energetic Arousal is increased by threeof the movie manipulations but that Positive Affect increases following the Happy movieonly

              23

              gt op lt- par(mfrow=c(12))

              gt data(affect)

              gt colors lt- c(blackredwhiteblue)

              gt films lt- c(SadHorrorNeutralHappy)

              gt affectstats lt- errorCircles(EA2TA2data=affect[-c(120)]group=Filmlabels=films

              + xlab=Energetic Arousal ylab=Tense Arousalylim=c(1022)xlim=c(820)pch=16

              + cex=2colors=colors main = Movies effect on arousal)gt errorCircles(PA2NA2data=affectstatslabels=filmsxlab=Positive Affect

              + ylab=Negative Affect pch=16cex=2colors=colors main =Movies effect on affect)

              gt op lt- par(mfrow=c(11))

              8 12 16 20

              1012

              1416

              1820

              22

              Movies effect on arousal

              Energetic Arousal

              Tens

              e A

              rous

              al

              SadHorror

              NeutralHappy

              6 8 10 12

              24

              68

              10

              Movies effect on affect

              Positive Affect

              Neg

              ativ

              e A

              ffect

              Sad

              Horror

              NeutralHappy

              Figure 9 The use of the errorCircles function allows for two dimensional displays ofmeans and error bars The first call to errorCircles finds descriptive statistics for theaffect dataframe based upon the grouping variable of Film These data are returned andthen used by the second call which examines the effect of the same grouping variable upondifferent measures The size of the circles represent the relative sample sizes for each groupThe data are from the PMC lab and reported in Smillie et al (2012)

              24

              346 Back to back histograms

              The bibars function summarize the characteristics of two groups (eg males and females)on a second variable (eg age) by drawing back to back histograms (see Figure 10)

              25

              data(bfi)gt png( bibarspng )

              gt with(bfibibars(agegenderylab=Agemain=Age by males and females))

              gt devoff()

              null device

              1

              Figure 10 A bar plot of the age distribution for males and females shows the use ofbibars The data are males and females from 2800 cases collected using the SAPAprocedure and are available as part of the bfi data set

              26

              347 Correlational structure

              There are many ways to display correlations Tabular displays are probably the mostcommon The output from the cor function in core R is a rectangular matrix lowerMat

              will round this to (2) digits and then display as a lower off diagonal matrix lowerCor

              calls cor with use=lsquopairwisersquo method=lsquopearsonrsquo as default values and returns (invisibly)the full correlation matrix and displays the lower off diagonal matrix

              gt lowerCor(satact)

              gendr edctn age ACT SATV SATQ

              gender 100

              education 009 100

              age -002 055 100

              ACT -004 015 011 100

              SATV -002 005 -004 056 100

              SATQ -017 003 -003 059 064 100

              When comparing results from two different groups it is convenient to display them as onematrix with the results from one group below the diagonal and the other group above thediagonal Use lowerUpper to do this

              gt female lt- subset(satactsatact$gender==2)

              gt male lt- subset(satactsatact$gender==1)

              gt lower lt- lowerCor(male[-1])

              edctn age ACT SATV SATQ

              education 100

              age 061 100

              ACT 016 015 100

              SATV 002 -006 061 100

              SATQ 008 004 060 068 100

              gt upper lt- lowerCor(female[-1])

              edctn age ACT SATV SATQ

              education 100

              age 052 100

              ACT 016 008 100

              SATV 007 -003 053 100

              SATQ 003 -009 058 063 100

              gt both lt- lowerUpper(lowerupper)

              gt round(both2)

              education age ACT SATV SATQ

              education NA 052 016 007 003

              age 061 NA 008 -003 -009

              ACT 016 015 NA 053 058

              SATV 002 -006 061 NA 063

              SATQ 008 004 060 068 NA

              It is also possible to compare two matrices by taking their differences and displaying one (be-low the diagonal) and the difference of the second from the first above the diagonal

              27

              gt diffs lt- lowerUpper(lowerupperdiff=TRUE)

              gt round(diffs2)

              education age ACT SATV SATQ

              education NA 009 000 -005 005

              age 061 NA 007 -003 013

              ACT 016 015 NA 008 002

              SATV 002 -006 061 NA 005

              SATQ 008 004 060 068 NA

              348 Heatmap displays of correlational structure

              Perhaps a better way to see the structure in a correlation matrix is to display a heat mapof the correlations This is just a matrix color coded to represent the magnitude of thecorrelation This is useful when considering the number of factors in a data set Considerthe Thurstone data set which has a clear 3 factor solution (Figure 11) or a simulated dataset of 24 variables with a circumplex structure (Figure 12) The color coding representsa ldquoheat maprdquo of the correlations with darker shades of red representing stronger negativeand darker shades of blue stronger positive correlations As an option the value of thecorrelation can be shown

              Yet another way to show structure is to use ldquospiderrdquo plots Particularly if variables areordered in some meaningful way (eg in a circumplex) a spider plot will show this structureeasily This is just a plot of the magnitude of the correlation as a radial line with lengthranging from 0 (for a correlation of -1) to 1 (for a correlation of 1) (See Figure 13)

              35 Testing correlations

              Correlations are wonderful descriptive statistics of the data but some people like to testwhether these correlations differ from zero or differ from each other The cortest func-tion (in the stats package) will test the significance of a single correlation and the rcorr

              function in the Hmisc package will do this for many correlations In the psych packagethe corrtest function reports the correlation (Pearson Spearman or Kendall) betweenall variables in either one or two data frames or matrices as well as the number of obser-vations for each case and the (two-tailed) probability for each correlation Unfortunatelythese probability values have not been corrected for multiple comparisons and so shouldbe taken with a great deal of salt Thus in corrtest and corrp the raw probabilitiesare reported below the diagonal and the probabilities adjusted for multiple comparisonsusing (by default) the Holm correction are reported above the diagonal (Table 1) (See thepadjust function for a discussion of Holm (1979) and other corrections)

              Testing the difference between any two correlations can be done using the rtest functionThe function actually does four different tests (based upon an article by Steiger (1980)

              28

              gt png(corplotpng)gt corPlot(Thurstonenumbers=TRUEupper=FALSEdiag=FALSEmain=9 cognitive variables from Thurstone)

              gt devoff()

              null device

              1

              Figure 11 The structure of correlation matrix can be seen more clearly if the variables aregrouped by factor and then the correlations are shown by color By using the rsquonumbersrsquooption the values are displayed as well By default the complete matrix is shown Settingupper=FALSE and diag=FALSE shows a cleaner figure

              29

              gt png(circplotpng)gt circ lt- simcirc(24)

              gt rcirc lt- cor(circ)

              gt corPlot(rcircmain=24 variables in a circumplex)gt devoff()

              null device

              1

              Figure 12 Using the corPlot function to show the correlations in a circumplex Correlationsare highest near the diagonal diminish to zero further from the diagonal and the increaseagain towards the corners of the matrix Circumplex structures are common in the studyof affect For circumplex structures it is perhaps useful to show the complete matrix

              30

              gt png(spiderpng)gt oplt- par(mfrow=c(22))

              gt spider(y=c(161218)x=124data=rcircfill=TRUEmain=Spider plot of 24 circumplex variables)

              gt op lt- par(mfrow=c(11))

              gt devoff()

              null device

              1

              Figure 13 A spider plot can show circumplex structure very clearly Circumplex structuresare common in the study of affect

              31

              Table 1 The corrtest function reports correlations cell sizes and raw and adjustedprobability values corrp reports the probability values for a correlation matrix Bydefault the adjustment used is that of Holm (1979)gt corrtest(satact)

              Callcorrtest(x = satact)

              Correlation matrix

              gender education age ACT SATV SATQ

              gender 100 009 -002 -004 -002 -017

              education 009 100 055 015 005 003

              age -002 055 100 011 -004 -003

              ACT -004 015 011 100 056 059

              SATV -002 005 -004 056 100 064

              SATQ -017 003 -003 059 064 100

              Sample Size

              gender education age ACT SATV SATQ

              gender 700 700 700 700 700 687

              education 700 700 700 700 700 687

              age 700 700 700 700 700 687

              ACT 700 700 700 700 700 687

              SATV 700 700 700 700 700 687

              SATQ 687 687 687 687 687 687

              Probability values (Entries above the diagonal are adjusted for multiple tests)

              gender education age ACT SATV SATQ

              gender 000 017 100 100 1 0

              education 002 000 000 000 1 1

              age 058 000 000 003 1 1

              ACT 033 000 000 000 0 0

              SATV 062 022 026 000 0 0

              SATQ 000 036 037 000 0 0

              To see confidence intervals of the correlations print with the short=FALSE option

              32

              depending upon the input

              1) For a sample size n find the t and p value for a single correlation as well as the confidenceinterval

              gt rtest(503)

              Correlation tests

              Callrtest(n = 50 r12 = 03)

              Test of significance of a correlation

              t value 218 with probability lt 0034

              and confidence interval 002 053

              2) For sample sizes of n and n2 (n2 = n if not specified) find the z of the difference betweenthe z transformed correlations divided by the standard error of the difference of two zscores

              gt rtest(3046)

              Correlation tests

              Callrtest(n = 30 r12 = 04 r34 = 06)

              Test of difference between two independent correlations

              z value 099 with probability 032

              3) For sample size n and correlations ra= r12 rb= r23 and r13 specified test for thedifference of two dependent correlations (Steiger case A)

              gt rtest(103451)

              Correlation tests

              Call[1] rtest(n = 103 r12 = 04 r23 = 01 r13 = 05 )

              Test of difference between two correlated correlations

              t value -089 with probability lt 037

              4) For sample size n test for the difference between two dependent correlations involvingdifferent variables (Steiger case B)

              gt rtest(103567558) steiger Case B

              Correlation tests

              Callrtest(n = 103 r12 = 05 r34 = 06 r23 = 07 r13 = 05 r14 = 05

              r24 = 08)

              Test of difference between two dependent correlations

              z value -12 with probability 023

              To test whether a matrix of correlations differs from what would be expected if the popu-lation correlations were all zero the function cortest follows Steiger (1980) who pointedout that the sum of the squared elements of a correlation matrix or the Fisher z scoreequivalents is distributed as chi square under the null hypothesis that the values are zero(ie elements of the identity matrix) This is particularly useful for examining whethercorrelations in a single matrix differ from zero or for comparing two matrices Althoughobvious cortest can be used to test whether the satact data matrix produces non-zerocorrelations (it does) This is a much more appropriate test when testing whether a residualmatrix differs from zero

              gt cortest(satact)

              33

              Tests of correlation matrices

              Callcortest(R1 = satact)

              Chi Square value 132542 with df = 15 with probability lt 18e-273

              36 Polychoric tetrachoric polyserial and biserial correlations

              The Pearson correlation of dichotomous data is also known as the φ coefficient If thedata eg ability items are thought to represent an underlying continuous although latentvariable the φ will underestimate the value of the Pearson applied to these latent variablesOne solution to this problem is to use the tetrachoric correlation which is based uponthe assumption of a bivariate normal distribution that has been cut at certain points Thedrawtetra function demonstrates the process (Figure 14) This is also shown in termsof dichotomizing the bivariate normal density function using the drawcor function (Fig-ure 15) A simple generalization of this to the case of the multiple cuts is the polychoric

              correlation

              Other estimated correlations based upon the assumption of bivariate normality with cutpoints include the biserial and polyserial correlation

              If the data are a mix of continuous polytomous and dichotomous variables the mixedcor

              function will calculate the appropriate mixture of Pearson polychoric tetrachoric biserialand polyserial correlations

              The correlation matrix resulting from a number of tetrachoric or polychoric correlationmatrix sometimes will not be positive semi-definite This will sometimes happen if thecorrelation matrix is formed by using pair-wise deletion of cases The corsmooth functionwill adjust the smallest eigen values of the correlation matrix to make them positive rescaleall of them to sum to the number of variables and produce aldquosmoothedrdquocorrelation matrixAn example of this problem is a data set of burt which probably had a typo in the originalcorrelation matrix Smoothing the matrix corrects this problem

              4 Multilevel modeling

              Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular use

              34

              gt drawtetra()

              minus3 minus2 minus1 0 1 2 3

              minus3

              minus2

              minus1

              01

              23

              Y rho = 05phi = 033

              X gt τY gt Τ

              X lt τY gt Τ

              X gt τY lt Τ

              X lt τY lt Τ

              x

              dnor

              m(x

              )

              X gt τ

              τ

              x1

              Y gt Τ

              Τ

              Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values

              35

              gt drawcor(expand=20cuts=c(00))

              xy

              z

              Bivariate density rho = 05

              Figure 15 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values It isfound (laboriously) by optimizing the fit of the bivariate normal for various values of thecorrelation to the observed cell frequencies

              36

              is the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups

              41 Decomposing data into within and between level correlations usingstatsBy

              There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler functionto give some of the basic descriptive statistics for two level models

              This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package

              rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)

              where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means

              42 Generating and displaying multilevel data

              withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a betweengroup correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data

              simmultilevel will generate simulated data with a multilevel structure

              The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary

              function specifying the variable of interest

              37

              Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )

              43 Factor analysis by groups

              Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output

              sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)

              faBy(sbnfactors=5) find the 5 factor solution for each education level

              5 Multiple Regression mediation moderation and set cor-relations

              The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices

              51 Multiple regression from data or correlation matrices

              The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstonecorrelation matrix and find the multiple correlation of the last five variables as a functionof the first 4

              gt setCor(y = 59x=14data=Thurstone)

              Call setCor(y = 59 x = 14 data = Thurstone)

              Multiple Regression from matrix input

              Beta weights

              FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

              Sentences 009 007 025 021 020

              Vocabulary 009 017 009 016 -002

              SentCompletion 002 005 004 021 008

              FirstLetters 058 045 021 008 031

              38

              Multiple R

              FourLetterWords Suffixes LetterSeries Pedigrees

              069 063 050 058

              LetterGroup

              048

              multiple R2

              FourLetterWords Suffixes LetterSeries Pedigrees

              048 040 025 034

              LetterGroup

              023

              Multiple Inflation Factor (VIF) = 1(1-SMC) =

              Sentences Vocabulary SentCompletion FirstLetters

              369 388 300 135

              Unweighted multiple R

              FourLetterWords Suffixes LetterSeries Pedigrees

              059 058 049 058

              LetterGroup

              045

              Unweighted multiple R2

              FourLetterWords Suffixes LetterSeries Pedigrees

              034 034 024 033

              LetterGroup

              020

              Various estimates of between set correlations

              Squared Canonical Correlations

              [1] 06280 01478 00076 00049

              Average squared canonical correlation = 02

              Cohens Set Correlation R2 = 069

              Unweighted correlation between the two sets = 073

              By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed

              gt sc lt- setCor(y = 59x=34data=Thurstonez=12)

              Call setCor(y = 59 x = 34 data = Thurstone z = 12)

              Multiple Regression from matrix input

              Beta weights

              FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

              SentCompletion 002 005 004 021 008

              FirstLetters 058 045 021 008 031

              Multiple R

              FourLetterWords Suffixes LetterSeries Pedigrees

              058 046 021 018

              LetterGroup

              030

              39

              multiple R2

              FourLetterWords Suffixes LetterSeries Pedigrees

              0331 0210 0043 0032

              LetterGroup

              0092

              Multiple Inflation Factor (VIF) = 1(1-SMC) =

              SentCompletion FirstLetters

              102 102

              Unweighted multiple R

              FourLetterWords Suffixes LetterSeries Pedigrees

              044 035 017 014

              LetterGroup

              026

              Unweighted multiple R2

              FourLetterWords Suffixes LetterSeries Pedigrees

              019 012 003 002

              LetterGroup

              007

              Various estimates of between set correlations

              Squared Canonical Correlations

              [1] 0405 0023

              Average squared canonical correlation = 021

              Cohens Set Correlation R2 = 042

              Unweighted correlation between the two sets = 048

              gt round(sc$residual2)

              FourLetterWords Suffixes LetterSeries Pedigrees

              FourLetterWords 052 011 009 006

              Suffixes 011 060 -001 001

              LetterSeries 009 -001 075 028

              Pedigrees 006 001 028 066

              LetterGroup 013 003 037 020

              LetterGroup

              FourLetterWords 013

              Suffixes 003

              LetterSeries 037

              Pedigrees 020

              LetterGroup 077

              52 Mediation and Moderation analysis

              Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping

              40

              Consider the example from Preacher and Hayes (2004) as analyzed using the mediate

              function and the subsequent graphic from mediatediagram The data are found in theexample for mediate

              Call mediate(y = SATIS x = THERAPY m = ATTRIB data = sobel)

              The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB

              Total Direct effect(c) of THERAPY on SATIS = 076 SE = 031 t direct = 25 with probability = 0019

              Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t direct = 135 with probability = 019

              Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033

              Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 004 Upper CI = 069

              R2 of model = 031

              To see the longer output specify short = FALSE in the print statement

              Full output

              Total effect estimates (c)

              SATIS se t Prob

              THERAPY 076 031 25 00186

              Direct effect estimates (c)SATIS se t Prob

              THERAPY 043 032 135 0190

              ATTRIB 040 018 223 0034

              a effect estimates

              THERAPY se t Prob

              ATTRIB 082 03 274 00106

              b effect estimates

              SATIS se t Prob

              ATTRIB 04 018 223 0034

              ab effect estimates

              SATIS boot sd lower upper

              THERAPY 033 032 017 004 069

              bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables

              setCor(y = c( SATV SATQ) x = c(education age ) data = satact std=TRUE)

              bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap

              mediate(y = c( SATV ) x = c(education age ) m= ACT data =satactstd=TRUEniter=50)

              bull mediate will take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It then tests the mediation effect using a bootstrap The particular example is for demonstration purposes only and shows neithermoderation nor mediation The number of iterations for the boot strap was set to 50

              41

              gt mediatediagram(preacher)

              Mediation model

              THERAPY SATIS

              ATTRIB

              082

              c = 076

              c = 043

              04

              Figure 16 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram

              42

              gt preacher lt- setCor(1c(23)sobelstd=FALSE)

              gt setCordiagram(preacher)

              Regression Models

              THERAPY

              ATTRIB

              SATIS

              043

              04

              021

              Figure 17 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure

              43

              for speed The default number of boot straps is 5000

              53 Set Correlation

              An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor

              function Set correlation is

              R2 = 1minusn

              prodi=1

              (1minusλi)

              where λi is the ith eigen value of the eigen value decomposition of the matrix

              R = Rminus1xx RxyRminus1

              xx Rminus1xy

              Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate

              setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data

              Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things tonotice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied

              For this example the analysis is done on the correlation matrix rather than the rawdata

              gt C lt- cov(satactuse=pairwise)

              gt model1 lt- lm(ACT~ gender + education + age data=satact)

              gt summary(model1)

              Call

              lm(formula = ACT ~ gender + education + age data = satact)

              Residuals

              44

              Call mediate(y = c(SATQ) x = c(ACT) m = education data = satact

              mod = gender niter = 50 std = TRUE)

              The DV (Y) was SATQ The IV (X) was ACT gender ACTXgndr The mediating variable(s) = education

              Total Direct effect(c) of ACT on SATQ = 058 SE = 003 t direct = 1925 with probability = 0

              Direct effect (c) of ACT on SATQ removing education = 059 SE = 003 t direct = 1926 with probability = 0

              Indirect effect (ab) of ACT on SATQ through education = -001

              Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -002 Upper CI = 0

              Total Direct effect(c) of gender on SATQ = -014 SE = 003 t direct = -478 with probability = 21e-06

              Direct effect (c) of gender on NA removing education = -014 SE = 003 t direct = -463 with probability = 44e-06

              Indirect effect (ab) of gender on SATQ through education = 0

              Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -001 Upper CI = 0

              Total Direct effect(c) of ACTXgndr on SATQ = 0 SE = 003 t direct = 002 with probability = 099

              Direct effect (c) of ACTXgndr on NA removing education = 0 SE = 003 t direct = 001 with probability = 099

              Indirect effect (ab) of ACTXgndr on SATQ through education = 0

              Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = 0 Upper CI = 0

              R2 of model = 037

              To see the longer output specify short = FALSE in the print statement

              Full output

              Total effect estimates (c)

              SATQ se t Prob

              ACT 058 003 1925 000e+00

              gender -014 003 -478 210e-06

              ACTXgndr 000 003 002 985e-01

              Direct effect estimates (c)SATQ se t Prob

              ACT 059 003 1926 000e+00

              gender -014 003 -463 437e-06

              ACTXgndr 000 003 001 992e-01

              a effect estimates

              education se t Prob

              ACT 016 004 422 277e-05

              gender 009 004 250 128e-02

              ACTXgndr -001 004 -015 883e-01

              b effect estimates

              SATQ se t Prob

              education -004 003 -145 0147

              ab effect estimates

              SATQ boot sd lower upper

              ACT -001 -001 001 0 0

              gender 000 000 000 0 0

              ACTXgndr 000 000 000 0 0

              Moderation model

              ACT

              gender

              ACTXgndr

              SATQ

              education016 c = 058

              c = 059

              009 c = minus014

              c = minus014

              minus001 c = 0

              c = 0

              minus004

              minus004

              minus007

              002

              Figure 18 Moderated multiple regression requires the raw data

              45

              Min 1Q Median 3Q Max

              -252458 -32133 07769 35921 92630

              Coefficients

              Estimate Std Error t value Pr(gt|t|)

              (Intercept) 2741706 082140 33378 lt 2e-16

              gender -048606 037984 -1280 020110

              education 047890 015235 3143 000174

              age 001623 002278 0712 047650

              ---

              Signif codes 0 0001 001 005 01 1

              Residual standard error 4768 on 696 degrees of freedom

              Multiple R-squared 00272 Adjusted R-squared 002301

              F-statistic 6487 on 3 and 696 DF p-value 00002476

              Compare this with the output from setCor

              gt compare with sector

              gt setCor(c(46)c(13)C nobs=700)

              Call setCor(y = c(46) x = c(13) data = C nobs = 700)

              Multiple Regression from matrix input

              Beta weights

              ACT SATV SATQ

              gender -005 -003 -018

              education 014 010 010

              age 003 -010 -009

              Multiple R

              ACT SATV SATQ

              016 010 019

              multiple R2

              ACT SATV SATQ

              00272 00096 00359

              Multiple Inflation Factor (VIF) = 1(1-SMC) =

              gender education age

              101 145 144

              Unweighted multiple R

              ACT SATV SATQ

              015 005 011

              Unweighted multiple R2

              ACT SATV SATQ

              002 000 001

              SE of Beta weights

              ACT SATV SATQ

              gender 018 429 434

              education 022 513 518

              age 022 511 516

              t of Beta Weights

              ACT SATV SATQ

              gender -027 -001 -004

              education 065 002 002

              46

              age 015 -002 -002

              Probability of t lt

              ACT SATV SATQ

              gender 079 099 097

              education 051 098 098

              age 088 098 099

              Shrunken R2

              ACT SATV SATQ

              00230 00054 00317

              Standard Error of R2

              ACT SATV SATQ

              00120 00073 00137

              F

              ACT SATV SATQ

              649 226 863

              Probability of F lt

              ACT SATV SATQ

              248e-04 808e-02 124e-05

              degrees of freedom of regression

              [1] 3 696

              Various estimates of between set correlations

              Squared Canonical Correlations

              [1] 0050 0033 0008

              Chisq of canonical correlations

              [1] 358 231 56

              Average squared canonical correlation = 003

              Cohens Set Correlation R2 = 009

              Shrunken Set Correlation R2 = 008

              F and df of Cohens Set Correlation 726 9 168186

              Unweighted correlation between the two sets = 001

              Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

              6 Converting output to APA style tables using LATEX

              Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenient

              47

              LATEXoutput and finally df2latex converts a generic data frame to LATEX

              An example of converting the output from fa to LATEXappears in Table 2

              Table 2 fa2latexA factor analysis table from the psych package in R

              Variable MR1 MR2 MR3 h2 u2 com

              Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

              SS loadings 264 186 15

              MR1 100 059 054MR2 059 100 052MR3 054 052 100

              48

              7 Miscellaneous functions

              A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

              blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

              df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

              scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

              cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

              cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

              dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

              fisherz Convert a correlation to the corresponding Fisher z score

              geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

              ICC and cohenkappa are typically used to find the reliability for raters

              headtail combines the head and tail functions to show the first and last lines of a dataset or output

              topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

              mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

              prep finds the probability of replication for an F t or r and estimate effect size

              partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

              rangeCorrection will correct correlations for restriction of range

              reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

              49

              superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

              8 Data sets

              A number of data sets for demonstrating psychometric techniques are included in thepsych package These include six data sets showing a hierarchical factor structure (fivecognitive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2and one from health psychology Reise) One of these (Thurstone) is used as an examplein the sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) or 13 personality inventory scores (epibfi) and14 multiple choice iq items (iqitems) The vegetables example has paired comparisonpreferences for 9 vegetables This is an example of Thurstonian scaling used by Guilford(1954) and Nunnally (1967) Other data sets include cubits peas and heights fromGalton

              Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

              bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

              satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

              epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

              50

              iq 14 multiple choice ability items were included as part of the Synthetic Aperture Person-ality Assessment (SAPA) web based personality assessment project The data from1000 subjects are included here as a demonstration set for scoring multiple choiceinventories and doing basic item statistics

              galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to ntroduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

              Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

              miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

              9 Development version and a users guide

              The most recent development version is available as a source file at the repository main-tained at httppersonality-projectorgr That version will have removed the mostrecently discovered bugs (but perhaps introduced other yet to be discovered ones) Todownload that version go to the repository httppersonality-projectorgrsrc

              contrib and wander around For a Mac this version can be installed directly using theldquoother repositoryrdquo option in the package installer For a PC the zip file for the most recentrelease has been created using the win-builder facility at CRAN The development releasefor the Mac is usually several weeks ahead of the PC development version

              Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httppersonality-projectorgrpsych_manualpdf

              News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

              gt news(Version gt 170package=psych)

              51

              10 Psychometric Theory

              The psych package has been developed to help psychologists do basic research Many ofthe functions were developed to supplement a book (httppersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep)More information about the use of some of the functions may be found in the book

              For more extensive discussion of the use of psych in particular and R in general consulthttppersonality-projectorgrrguidehtml A short guide to R

              11 SessionInfo

              This document was prepared using the following settings

              gt sessionInfo()

              R Under development (unstable) (2017-03-05 r72309)

              Platform x86_64-apple-darwin1340 (64-bit)

              Running under macOS Sierra 10124

              Matrix products default

              BLAS LibraryFrameworksRframeworkVersions34ResourcesliblibRblas0dylib

              LAPACK LibraryFrameworksRframeworkVersions34ResourcesliblibRlapackdylib

              locale

              [1] C

              attached base packages

              [1] stats graphics grDevices utils datasets methods base

              other attached packages

              [1] psych_17421

              loaded via a namespace (and not attached)

              [1] compiler_340 parallel_340 tools_340 foreign_08-67

              [5] KernSmooth_223-15 nlme_31-131 mnormt_15-4 grid_340

              [9] lattice_020-34

              52

              References

              Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

              Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

              Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

              Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

              Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

              Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

              Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

              Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

              Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

              Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

              Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

              Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

              Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

              Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

              Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

              53

              Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

              Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

              Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

              Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

              Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

              Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

              Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

              Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

              Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

              Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

              MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

              Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

              McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

              Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

              Nunnally J C (1967) Psychometric theory McGraw-Hill New York

              54

              Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

              3rd edition

              Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

              Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

              Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

              Revelle W (2015) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 158

              Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

              Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

              Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

              Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

              Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook ofIndividual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

              Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

              Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

              Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

              Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bang

              55

              for the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

              Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

              Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

              Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

              Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

              Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

              Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

              Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

              Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

              Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

              Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

              Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

              56

              Index

              affect 14 24alpha 5 6

              Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26bifactor 6biserial 13 34blockrandom 49burt 34

              char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49circular statistics 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49ctv 7cubits 50

              densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49diagram 8drawcor 34drawtetra 34dummycode 13

              dynamite plot 19

              edit 3epibfi 50error bars 19errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23

              fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factor analysis 6factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49

              galton 51generalized least squares 6geometricmean 49GPArotation 7guttman 6

              harmonicmean 49head 49headtail 49heights 50hetdiagram 7Hmisc 28Holzinger 50

              57

              ICC 6 49iclust 6iclustdiagram 7Index 49introduction to psychometric theory with ap-

              plications in R 7iqitems 50irtfa 6 47irt2latex 47 49

              KnitR 47

              lavaan 38library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27lowess 14

              makekeys 14MAP 6mardia 49maximum likelihood 6mediate 4 41 42mediatediagram 41minimum residual 6mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevel 37multilevelreliability 6multiple regression 38

              nfactors 6nlme 37

              omega 6 7outlier 3 11 12

              padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7principal axis 6psych 3 5ndash8 28 47 49ndash52

              R functionaffect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50

              58

              densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13edit 3epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49head 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47

              irt2latex 47 49library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7psych 51psych package

              affect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26

              59

              biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49corsmooth 34cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7

              fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47irt2latex 47 49lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12prep 49pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34

              60

              polyserial 34principal 5ndash7psych 51rtest 28rangeCorrection 49readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

              rtest 28

              rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50table 19tail 49tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

              R package

              61

              ctv 7GPArotation 7Hmisc 28KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash8 28 47 49ndash52Rgraphviz 8sem 7 50stats 28Sweave 47xtable 47

              rtest 28rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49Rgraphviz 8

              SAPA 26 50 51satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43sem 7 50set correlation 44setcor 49setCor 4 38 41 44 46 47simmultilevel 37

              spider 13stars 14stats 28StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50Sweave 47

              table 19tail 49tetrachoric 6 34Thurstone 28 38 50Thurstone33 50topBottom 49

              vegetables 50 51violinBy 14 18vss 5 6

              weighted least squares 6withinBetween 37

              xtable 47

              62

              • Jump starting the psych packagendasha guide for the impatient
              • Psychometric functions are summarized in the second vignette
              • Overview of this and related documents
              • Getting started
              • Basic data analysis
                • Getting the data by using readfile
                • Data input from the clipboard
                • Basic descriptive statistics
                  • Outlier detection using outlier
                  • Basic data cleaning using scrub
                  • Recoding categorical variables into dummy coded variables
                    • Simple descriptive graphics
                      • Scatter Plot Matrices
                      • Density or violin plots
                      • Means and error bars
                      • Error bars for tabular data
                      • Two dimensional displays of means and errors
                      • Back to back histograms
                      • Correlational structure
                      • Heatmap displays of correlational structure
                        • Testing correlations
                        • Polychoric tetrachoric polyserial and biserial correlations
                          • Multilevel modeling
                            • Decomposing data into within and between level correlations using statsBy
                            • Generating and displaying multilevel data
                            • Factor analysis by groups
                              • Multiple Regression mediation moderation and set correlations
                                • Multiple regression from data or correlation matrices
                                • Mediation and Moderation analysis
                                • Set Correlation
                                  • Converting output to APA style tables using LaTeX
                                  • Miscellaneous functions
                                  • Data sets
                                  • Development version and a users guide
                                  • Psychometric Theory
                                  • SessionInfo

                installpackages(list(c(GPArotationmnormt)

                Because of the difficulty of installing the package Rgraphviz alternative graphics have beendeveloped and are available as diagram functions If Rgraphviz is available some functionswill take advantage of it An alternative is to useldquodotrdquooutput of commands for any externalgraphics package that uses the dot language

                3 Basic data analysis

                A number of psych functions facilitate the entry of data and finding basic descriptivestatistics

                Remember to run any of the psych functions it is necessary to make the package activeby using the library command

                library(psych)

                The other packages once installed will be called automatically by psych

                It is possible to automatically load psych and other functions by creating and then savinga ldquoFirstrdquo function eg

                First lt- function(x) library(psych)

                31 Getting the data by using readfile

                Although many find copying the data to the clipboard and then using the readclipboardfunctions (see below) a helpful alternative is to read the data in directly This can be doneusing the readfile function which calls filechoose to find the file and then based uponthe suffix of the file chooses the appropriate way to read it For files with suffixes of txttext r rds rda csv xpt or sav the file will be read correctly

                mydata lt- readfile()

                If the file contains Fixed Width Format (fwf) data the column information can be specifiedwith the widths command

                mydata lt- readfile(widths = c(4rep(135)) will read in a file without a header row and 36 fields the first of which is 4 colums the rest of which are 1 column each

                If the file is a RData file (with suffix of RData Rda rda Rdata or rdata) the objectwill be loaded Depending what was stored this might be several objects If the file is asav file from SPSS it will be read with the most useful default options (converting the fileto a dataframe and converting character fields to numeric) Alternative options may bespecified If it is an export file from SAS (xpt or XPT) it will be read csv files (comma

                8

                separated files) normal txt or text files data or dat files will be read as well These areassumed to have a header row of variable labels (header=TRUE) If the data do not havea header row you must specify readfile(header=FALSE)

                To read SPSS files and to keep the value labels specify usevaluelabels=TRUE

                myspss lt- readfile(usevaluelabels=TRUE) this will keep the value labels for sav files

                32 Data input from the clipboard

                There are of course many ways to enter data into R Reading from a local file usingreadtable is perhaps the most preferred However many users will enter their datain a text editor or spreadsheet program and then want to copy and paste into R Thismay be done by using readtable and specifying the input file as ldquoclipboardrdquo (PCs) orldquopipe(pbpaste)rdquo (Macs) Alternatively the readclipboard set of functions are perhapsmore user friendly

                readclipboard is the base function for reading data from the clipboard

                readclipboardcsv for reading text that is comma delimited

                readclipboardtab for reading text that is tab delimited (eg copied directly from anExcel file)

                readclipboardlower for reading input of a lower triangular matrix with or without adiagonal The resulting object is a square matrix

                readclipboardupper for reading input of an upper triangular matrix

                readclipboardfwf for reading in fixed width fields (some very old data sets)

                For example given a data set copied to the clipboard from a spreadsheet just enter thecommand

                mydata lt- readclipboard()

                This will work if every data field has a value and even missing data are given some values(eg NA or -999) If the data were entered in a spreadsheet and the missing valueswere just empty cells then the data should be read in as a tab delimited or by using thereadclipboardtab function

                gt mydata lt- readclipboard(sep=t) define the tab option or

                gt mytabdata lt- readclipboardtab() just use the alternative function

                For the case of data in fixed width fields (some old data sets tend to have this format)copy to the clipboard and then specify the width of each field (in the example below the

                9

                first variable is 5 columns the second is 2 columns the next 5 are 1 column the last 4 are3 columns)

                gt mydata lt- readclipboardfwf(widths=c(52rep(15)rep(34))

                33 Basic descriptive statistics

                Once the data are read in then describe or describeBy will provide basic descriptivestatistics arranged in a data frame format Consider the data set satact which in-cludes data from 700 web based participants on 3 demographic variables and 3 abilitymeasures

                describe reports means standard deviations medians min max range skew kurtosisand standard errors for integer or real data Non-numeric data although the statisticsare meaningless will be treated as if numeric (based upon the categorical coding ofthe data) and will be flagged with an

                describeBy reports descriptive statistics broken down by some categorizing variable (eggender age etc)

                gt library(psych)

                gt data(satact)

                gt describe(satact) basic descriptive statistics

                vars n mean sd median trimmed mad min max range skew

                gender 1 700 165 048 2 168 000 1 2 1 -061

                education 2 700 316 143 3 331 148 0 5 5 -068

                age 3 700 2559 950 22 2386 593 13 65 52 164

                ACT 4 700 2855 482 29 2884 445 3 36 33 -066

                SATV 5 700 61223 11290 620 61945 11861 200 800 600 -064

                SATQ 6 687 61022 11564 620 61725 11861 200 800 600 -059

                kurtosis se

                gender -162 002

                education -007 005

                age 242 036

                ACT 053 018

                SATV 033 427

                SATQ -002 441

                These data may then be analyzed by groups defined in a logical statement or by some othervariable Eg break down the descriptive data for males or females These descriptivedata can also be seen graphically using the errorbarsby function (Figure 6) By settingskew=FALSE and ranges=FALSE the output is limited to the most basic statistics

                gt basic descriptive statistics by a grouping variable

                gt describeBy(satactsatact$genderskew=FALSEranges=FALSE)

                Descriptive statistics by group

                group 1

                vars n mean sd se

                gender 1 247 100 000 000

                10

                education 2 247 300 154 010

                age 3 247 2586 974 062

                ACT 4 247 2879 506 032

                SATV 5 247 61511 11416 726

                SATQ 6 245 63587 11602 741

                ------------------------------------------------------------

                group 2

                vars n mean sd se

                gender 1 453 200 000 000

                education 2 453 326 135 006

                age 3 453 2545 937 044

                ACT 4 453 2842 469 022

                SATV 5 453 61066 11231 528

                SATQ 6 442 59600 11307 538

                The output from the describeBy function can be forced into a matrix form for easy analysisby other programs In addition describeBy can group by several grouping variables at thesame time

                gt samat lt- describeBy(satactlist(satact$gendersatact$education)

                + skew=FALSEranges=FALSEmat=TRUE)

                gt headTail(samat)

                item group1 group2 vars n mean sd se

                gender1 1 1 0 1 27 1 0 0

                gender2 2 2 0 1 30 2 0 0

                gender3 3 1 1 1 20 1 0 0

                gender4 4 2 1 1 25 2 0 0

                ltNAgt ltNAgt ltNAgt

                SATQ9 69 1 4 6 51 6359 10412 1458

                SATQ10 70 2 4 6 86 59759 10624 1146

                SATQ11 71 1 5 6 46 65783 8961 1321

                SATQ12 72 2 5 6 93 60672 10555 1095

                331 Outlier detection using outlier

                One way to detect unusual data is to consider how far each data point is from the mul-tivariate centroid of the data That is find the squared Mahalanobis distance for eachdata point and then compare these to the expected values of χ2 This produces a Q-Q(quantle-quantile) plot with the n most extreme data points labeled (Figure 1) The outliervalues are in the vector d2

                332 Basic data cleaning using scrub

                If after describing the data it is apparent that there were data entry errors that need tobe globally replaced with NA or only certain ranges of data will be analyzed the data canbe ldquocleanedrdquo using the scrub function

                Consider a data set of 10 rows of 12 columns with values from 1 - 120 All values of columns

                11

                gt png( outlierpng )

                gt d2 lt- outlier(satactcex=8)

                gt devoff()

                null device

                1

                Figure 1 Using the outlier function to graphically show outliers The y axis is theMahalanobis D2 the X axis is the distribution of χ2 for the same number of degrees offreedom The outliers detected here may be shown graphically using pairspanels (see2 and may be found by sorting d2

                12

                3 - 5 that are less than 30 40 or 50 respectively or greater than 70 in any of the threecolumns will be replaced with NA In addition any value exactly equal to 45 will be setto NA (max and isvalue are set to one value here but they could be a different value forevery column)

                gt x lt- matrix(1120ncol=10byrow=TRUE)

                gt colnames(x) lt- paste(V110sep=)gt newx lt- scrub(x35min=c(304050)max=70isvalue=45newvalue=NA)

                gt newx

                V1 V2 V3 V4 V5 V6 V7 V8 V9 V10

                [1] 1 2 NA NA NA 6 7 8 9 10

                [2] 11 12 NA NA NA 16 17 18 19 20

                [3] 21 22 NA NA NA 26 27 28 29 30

                [4] 31 32 33 NA NA 36 37 38 39 40

                [5] 41 42 43 44 NA 46 47 48 49 50

                [6] 51 52 53 54 55 56 57 58 59 60

                [7] 61 62 63 64 65 66 67 68 69 70

                [8] 71 72 NA NA NA 76 77 78 79 80

                [9] 81 82 NA NA NA 86 87 88 89 90

                [10] 91 92 NA NA NA 96 97 98 99 100

                [11] 101 102 NA NA NA 106 107 108 109 110

                [12] 111 112 NA NA NA 116 117 118 119 120

                Note that the number of subjects for those columns has decreased and the minimums havegone up but the maximums down Data cleaning and examination for outliers should be aroutine part of any data analysis

                333 Recoding categorical variables into dummy coded variables

                Sometimes categorical variables (eg college major occupation ethnicity) are to be ana-lyzed using correlation or regression To do this one can form ldquodummy codesrdquo which aremerely binary variables for each category This may be done using dummycode Subse-quent analyses using these dummy coded variables may be using biserial or point biserial(regular Pearson r) to show effect sizes and may be plotted in eg spider plots

                Alternatively sometimes data were coded originally as categorical (MaleFemale HighSchool some College in college etc) and you want to convert these columns of data tonumeric This is done by char2numeric

                34 Simple descriptive graphics

                Graphic descriptions of data are very helpful both for understanding the data as well ascommunicating important results Scatter Plot Matrices (SPLOMS) using the pairspanelsfunction are useful ways to look for strange effects involving outliers and non-linearitieserrorbarsby will show group means with 95 confidence boundaries By default er-rorbarsby and errorbars will show ldquocats eyesrdquo to graphically show the confidence

                13

                limits (Figure 6) This may be turned off by specifying eyes=FALSE densityBy or vio-

                linBy may be used to show the distribution of the data in ldquoviolinrdquo plots (Figure 5) (Theseare sometimes called ldquolava-lamprdquo plots)

                341 Scatter Plot Matrices

                Scatter Plot Matrices (SPLOMS) are very useful for describing the data The pairspanelsfunction adapted from the help menu for the pairs function produces xy scatter plots ofeach pair of variables below the diagonal shows the histogram of each variable on thediagonal and shows the lowess locally fit regression line as well An ellipse around themean with the axis length reflecting one standard deviation of the x and y variables is alsodrawn The x axis in each scatter plot represents the column variable the y axis the rowvariable (Figure 2) When plotting many subjects it is both faster and cleaner to set theplot character (pch) to be rsquorsquo (See Figure 2 for an example)

                pairspanels will show the pairwise scatter plots of all the variables as well as his-tograms locally smoothed regressions and the Pearson correlation When plottingmany data points (as in the case of the satact data it is possible to specify that theplot character is a period to get a somewhat cleaner graphic However in this figureto show the outliers we use colors and a larger plot character If we want to indicatersquosignificancersquo of the correlations by the conventional use of rsquomagic astricksrsquo we can setthe stars=TRUE option

                Another example of pairspanels is to show differences between experimental groupsConsider the data in the affect data set The scores reflect post test scores on positiveand negative affect and energetic and tense arousal The colors show the results for fourmovie conditions depressing frightening movie neutral and a comedy

                Yet another demonstration of pairspanels is useful when you have many subjects andwant to show the density of the distributions To do this we will use the makekeys

                and scoreItems functions (discussed in the second vignette) to create scales measuringEnergetic Arousal Tense Arousal Positive Affect and Negative Affect (see the msq helpfile) We then show a pairspanels scatter plot matrix where we smooth the data pointsand show the density of the distribution by color

                342 Density or violin plots

                Graphical presentation of data may be shown using box plots to show the median and 25thand 75th percentiles A powerful alternative is to show the density distribution using theviolinBy function (Figure 5)

                14

                gt png( pairspanelspng )

                gt satd2 lt- dataframe(satactd2) combine the d2 statistics from before with the satact dataframe

                gt pairspanels(satd2bg=c(yellowblue)[(d2 gt 25)+1]pch=21stars=TRUE)

                gt devoff()

                null device

                1

                Figure 2 Using the pairspanels function to graphically show relationships The x axisin each scatter plot represents the column variable the y axis the row variable Note theextreme outlier for the ACT If the plot character were set to a period (pch=rsquorsquo) it wouldmake a cleaner graphic but in to show the outliers in color we use the plot characters 21and 22

                15

                gt png(affectpng)gt pairspanels(affect[1417]bg=c(redblackwhiteblue)[affect$Film]pch=21

                + main=Affect varies by movies )

                gt devoff()

                null device

                1

                Figure 3 Using the pairspanels function to graphically show relationships The x axis ineach scatter plot represents the column variable the y axis the row variable The coloringrepresent four different movie conditions

                16

                gt keys lt- makekeys(msq[175]list(

                + EA = c(active energetic vigorous wakeful wideawake fullofpep

                + lively -sleepy -tired -drowsy)

                + TA =c(intense jittery fearful tense clutchedup -quiet -still

                + -placid -calm -atrest)

                + PA =c(active excited strong inspired determined attentive

                + interested enthusiastic proud alert)

                + NAf =c(jittery nervous scared afraid guilty ashamed distressed

                + upset hostile irritable )) )

                gt scores lt- scoreItems(keysmsq[175])

                gt png(msqpng)gt pairspanels(scores$scoressmoother=TRUE

                + main =Density distributions of four measures of affect )

                gt devoff()

                null device

                1

                Figure 4 Using the pairspanels function to graphically show relationships The x axis ineach scatter plot represents the column variable the y axis the row variable The variablesare four measures of motivational state for 3896 participants Each scale is the averagescore of 10 items measuring motivational state Compare this a plot with smoother set toFALSE

                17

                gt data(satact)

                gt violinBy(satact[56]satact$gendergrpname=c(M F)main=Density Plot by gender for SAT V and Q)

                Density Plot by gender for SAT V and Q

                Obs

                erve

                d

                SATV M SATV F SATQ M SATQ F

                200

                300

                400

                500

                600

                700

                800

                Figure 5 Using the violinBy function to show the distribution of SAT V and Q for malesand females The plot shows the medians and 25th and 75th percentiles as well as theentire range and the density distribution

                18

                343 Means and error bars

                Additional descriptive graphics include the ability to draw error bars on sets of data aswell as to draw error bars in both the x and y directions for paired data These are thefunctions errorbars errorbarsby errorbarstab and errorcrosses

                errorbars show the 95 confidence intervals for each variable in a data frame or ma-trix These errors are based upon normal theory and the standard errors of the meanAlternative options include +- one standard deviation or 1 standard error If thedata are repeated measures the error bars will be reflect the between variable cor-relations By default the confidence intervals are displayed using a ldquocats eyesrdquo plotwhich emphasizes the distribution of confidence within the confidence interval

                errorbarsby does the same but grouping the data by some condition

                errorbarstab draws bar graphs from tabular data with error bars based upon thestandard error of proportion (σp =

                radicpqN)

                errorcrosses draw the confidence intervals for an x set and a y set of the same size

                The use of the errorbarsby function allows for graphic comparisons of different groups(see Figure 6) Five personality measures are shown as a function of high versus low scoreson a ldquolierdquo scale People with higher lie scores tend to report being more agreeable consci-entious and less neurotic than people with lower lie scores The error bars are based uponnormal theory and thus are symmetric rather than reflect any skewing in the data

                Although not recommended it is possible to use the errorbars function to draw bargraphs with associated error bars (This kind of dynamite plot (Figure 8) can be verymisleading in that the scale is arbitrary Go to a discussion of the problems in presentingdata this way at httpemdbolkerwikidotcomblogdynamite In the example shownnote that the graph starts at 0 although is out of the range This is a function of usingbars which always are assumed to start at zero Consider other ways of showing yourdata

                344 Error bars for tabular data

                However it is sometimes useful to show error bars for tabular data either found by thetable function or just directly input These may be found using the errorbarstab

                function

                19

                gt data(epibfi)

                gt errorbarsby(epibfi[610]epibfi$epilielt4)

                095 confidence limits

                Independent Variable

                Dep

                ende

                nt V

                aria

                ble

                bfagree bfcon bfext bfneur bfopen

                050

                100

                150

                Figure 6 Using the errorbarsby function shows that self reported personality scales onthe Big Five Inventory vary as a function of the Lie scale on the EPI The ldquocats eyesrdquo showthe distribution of the confidence

                20

                gt errorbarsby(satact[56]satact$genderbars=TRUE

                + labels=c(MaleFemale)ylab=SAT scorexlab=)

                Male Female

                095 confidence limits

                SAT

                sco

                re

                200

                300

                400

                500

                600

                700

                800

                200

                300

                400

                500

                600

                700

                800

                Figure 7 A ldquoDynamite plotrdquo of SAT scores as a function of gender is one way of misleadingthe reader By using a bar graph the range of scores is ignored Bar graphs start from 0

                21

                gt T lt- with(satacttable(gendereducation))

                gt rownames(T) lt- c(MF)

                gt errorbarstab(Tway=bothylab=Proportion of Education Levelxlab=Level of Education

                + main=Proportion of sample by education level)

                Proportion of sample by education level

                Level of Education

                Pro

                port

                ion

                of E

                duca

                tion

                Leve

                l

                000

                005

                010

                015

                020

                025

                030

                M 0 M 1 M 2 M 3 M 4 M 5

                000

                005

                010

                015

                020

                025

                030

                Figure 8 The proportion of each education level that is Male or Female By using theway=rdquobothrdquo option the percentages and errors are based upon the grand total Alterna-tively way=rdquocolumnsrdquo finds column wise percentages way=rdquorowsrdquo finds rowwise percent-ages The data can be converted to percentages (as shown) or by total count (raw=TRUE)The function invisibly returns the probabilities and standard errors See the help menu foran example of entering the data as a dataframe

                22

                345 Two dimensional displays of means and errors

                Yet another way to display data for different conditions is to use the errorCrosses func-tion For instance the effect of various movies on both ldquoEnergetic Arousalrdquo and ldquoTenseArousalrdquo can be seen in one graph and compared to the same movie manipulations onldquoPositive Affectrdquo and ldquoNegative Affectrdquo Note how Energetic Arousal is increased by threeof the movie manipulations but that Positive Affect increases following the Happy movieonly

                23

                gt op lt- par(mfrow=c(12))

                gt data(affect)

                gt colors lt- c(blackredwhiteblue)

                gt films lt- c(SadHorrorNeutralHappy)

                gt affectstats lt- errorCircles(EA2TA2data=affect[-c(120)]group=Filmlabels=films

                + xlab=Energetic Arousal ylab=Tense Arousalylim=c(1022)xlim=c(820)pch=16

                + cex=2colors=colors main = Movies effect on arousal)gt errorCircles(PA2NA2data=affectstatslabels=filmsxlab=Positive Affect

                + ylab=Negative Affect pch=16cex=2colors=colors main =Movies effect on affect)

                gt op lt- par(mfrow=c(11))

                8 12 16 20

                1012

                1416

                1820

                22

                Movies effect on arousal

                Energetic Arousal

                Tens

                e A

                rous

                al

                SadHorror

                NeutralHappy

                6 8 10 12

                24

                68

                10

                Movies effect on affect

                Positive Affect

                Neg

                ativ

                e A

                ffect

                Sad

                Horror

                NeutralHappy

                Figure 9 The use of the errorCircles function allows for two dimensional displays ofmeans and error bars The first call to errorCircles finds descriptive statistics for theaffect dataframe based upon the grouping variable of Film These data are returned andthen used by the second call which examines the effect of the same grouping variable upondifferent measures The size of the circles represent the relative sample sizes for each groupThe data are from the PMC lab and reported in Smillie et al (2012)

                24

                346 Back to back histograms

                The bibars function summarize the characteristics of two groups (eg males and females)on a second variable (eg age) by drawing back to back histograms (see Figure 10)

                25

                data(bfi)gt png( bibarspng )

                gt with(bfibibars(agegenderylab=Agemain=Age by males and females))

                gt devoff()

                null device

                1

                Figure 10 A bar plot of the age distribution for males and females shows the use ofbibars The data are males and females from 2800 cases collected using the SAPAprocedure and are available as part of the bfi data set

                26

                347 Correlational structure

                There are many ways to display correlations Tabular displays are probably the mostcommon The output from the cor function in core R is a rectangular matrix lowerMat

                will round this to (2) digits and then display as a lower off diagonal matrix lowerCor

                calls cor with use=lsquopairwisersquo method=lsquopearsonrsquo as default values and returns (invisibly)the full correlation matrix and displays the lower off diagonal matrix

                gt lowerCor(satact)

                gendr edctn age ACT SATV SATQ

                gender 100

                education 009 100

                age -002 055 100

                ACT -004 015 011 100

                SATV -002 005 -004 056 100

                SATQ -017 003 -003 059 064 100

                When comparing results from two different groups it is convenient to display them as onematrix with the results from one group below the diagonal and the other group above thediagonal Use lowerUpper to do this

                gt female lt- subset(satactsatact$gender==2)

                gt male lt- subset(satactsatact$gender==1)

                gt lower lt- lowerCor(male[-1])

                edctn age ACT SATV SATQ

                education 100

                age 061 100

                ACT 016 015 100

                SATV 002 -006 061 100

                SATQ 008 004 060 068 100

                gt upper lt- lowerCor(female[-1])

                edctn age ACT SATV SATQ

                education 100

                age 052 100

                ACT 016 008 100

                SATV 007 -003 053 100

                SATQ 003 -009 058 063 100

                gt both lt- lowerUpper(lowerupper)

                gt round(both2)

                education age ACT SATV SATQ

                education NA 052 016 007 003

                age 061 NA 008 -003 -009

                ACT 016 015 NA 053 058

                SATV 002 -006 061 NA 063

                SATQ 008 004 060 068 NA

                It is also possible to compare two matrices by taking their differences and displaying one (be-low the diagonal) and the difference of the second from the first above the diagonal

                27

                gt diffs lt- lowerUpper(lowerupperdiff=TRUE)

                gt round(diffs2)

                education age ACT SATV SATQ

                education NA 009 000 -005 005

                age 061 NA 007 -003 013

                ACT 016 015 NA 008 002

                SATV 002 -006 061 NA 005

                SATQ 008 004 060 068 NA

                348 Heatmap displays of correlational structure

                Perhaps a better way to see the structure in a correlation matrix is to display a heat mapof the correlations This is just a matrix color coded to represent the magnitude of thecorrelation This is useful when considering the number of factors in a data set Considerthe Thurstone data set which has a clear 3 factor solution (Figure 11) or a simulated dataset of 24 variables with a circumplex structure (Figure 12) The color coding representsa ldquoheat maprdquo of the correlations with darker shades of red representing stronger negativeand darker shades of blue stronger positive correlations As an option the value of thecorrelation can be shown

                Yet another way to show structure is to use ldquospiderrdquo plots Particularly if variables areordered in some meaningful way (eg in a circumplex) a spider plot will show this structureeasily This is just a plot of the magnitude of the correlation as a radial line with lengthranging from 0 (for a correlation of -1) to 1 (for a correlation of 1) (See Figure 13)

                35 Testing correlations

                Correlations are wonderful descriptive statistics of the data but some people like to testwhether these correlations differ from zero or differ from each other The cortest func-tion (in the stats package) will test the significance of a single correlation and the rcorr

                function in the Hmisc package will do this for many correlations In the psych packagethe corrtest function reports the correlation (Pearson Spearman or Kendall) betweenall variables in either one or two data frames or matrices as well as the number of obser-vations for each case and the (two-tailed) probability for each correlation Unfortunatelythese probability values have not been corrected for multiple comparisons and so shouldbe taken with a great deal of salt Thus in corrtest and corrp the raw probabilitiesare reported below the diagonal and the probabilities adjusted for multiple comparisonsusing (by default) the Holm correction are reported above the diagonal (Table 1) (See thepadjust function for a discussion of Holm (1979) and other corrections)

                Testing the difference between any two correlations can be done using the rtest functionThe function actually does four different tests (based upon an article by Steiger (1980)

                28

                gt png(corplotpng)gt corPlot(Thurstonenumbers=TRUEupper=FALSEdiag=FALSEmain=9 cognitive variables from Thurstone)

                gt devoff()

                null device

                1

                Figure 11 The structure of correlation matrix can be seen more clearly if the variables aregrouped by factor and then the correlations are shown by color By using the rsquonumbersrsquooption the values are displayed as well By default the complete matrix is shown Settingupper=FALSE and diag=FALSE shows a cleaner figure

                29

                gt png(circplotpng)gt circ lt- simcirc(24)

                gt rcirc lt- cor(circ)

                gt corPlot(rcircmain=24 variables in a circumplex)gt devoff()

                null device

                1

                Figure 12 Using the corPlot function to show the correlations in a circumplex Correlationsare highest near the diagonal diminish to zero further from the diagonal and the increaseagain towards the corners of the matrix Circumplex structures are common in the studyof affect For circumplex structures it is perhaps useful to show the complete matrix

                30

                gt png(spiderpng)gt oplt- par(mfrow=c(22))

                gt spider(y=c(161218)x=124data=rcircfill=TRUEmain=Spider plot of 24 circumplex variables)

                gt op lt- par(mfrow=c(11))

                gt devoff()

                null device

                1

                Figure 13 A spider plot can show circumplex structure very clearly Circumplex structuresare common in the study of affect

                31

                Table 1 The corrtest function reports correlations cell sizes and raw and adjustedprobability values corrp reports the probability values for a correlation matrix Bydefault the adjustment used is that of Holm (1979)gt corrtest(satact)

                Callcorrtest(x = satact)

                Correlation matrix

                gender education age ACT SATV SATQ

                gender 100 009 -002 -004 -002 -017

                education 009 100 055 015 005 003

                age -002 055 100 011 -004 -003

                ACT -004 015 011 100 056 059

                SATV -002 005 -004 056 100 064

                SATQ -017 003 -003 059 064 100

                Sample Size

                gender education age ACT SATV SATQ

                gender 700 700 700 700 700 687

                education 700 700 700 700 700 687

                age 700 700 700 700 700 687

                ACT 700 700 700 700 700 687

                SATV 700 700 700 700 700 687

                SATQ 687 687 687 687 687 687

                Probability values (Entries above the diagonal are adjusted for multiple tests)

                gender education age ACT SATV SATQ

                gender 000 017 100 100 1 0

                education 002 000 000 000 1 1

                age 058 000 000 003 1 1

                ACT 033 000 000 000 0 0

                SATV 062 022 026 000 0 0

                SATQ 000 036 037 000 0 0

                To see confidence intervals of the correlations print with the short=FALSE option

                32

                depending upon the input

                1) For a sample size n find the t and p value for a single correlation as well as the confidenceinterval

                gt rtest(503)

                Correlation tests

                Callrtest(n = 50 r12 = 03)

                Test of significance of a correlation

                t value 218 with probability lt 0034

                and confidence interval 002 053

                2) For sample sizes of n and n2 (n2 = n if not specified) find the z of the difference betweenthe z transformed correlations divided by the standard error of the difference of two zscores

                gt rtest(3046)

                Correlation tests

                Callrtest(n = 30 r12 = 04 r34 = 06)

                Test of difference between two independent correlations

                z value 099 with probability 032

                3) For sample size n and correlations ra= r12 rb= r23 and r13 specified test for thedifference of two dependent correlations (Steiger case A)

                gt rtest(103451)

                Correlation tests

                Call[1] rtest(n = 103 r12 = 04 r23 = 01 r13 = 05 )

                Test of difference between two correlated correlations

                t value -089 with probability lt 037

                4) For sample size n test for the difference between two dependent correlations involvingdifferent variables (Steiger case B)

                gt rtest(103567558) steiger Case B

                Correlation tests

                Callrtest(n = 103 r12 = 05 r34 = 06 r23 = 07 r13 = 05 r14 = 05

                r24 = 08)

                Test of difference between two dependent correlations

                z value -12 with probability 023

                To test whether a matrix of correlations differs from what would be expected if the popu-lation correlations were all zero the function cortest follows Steiger (1980) who pointedout that the sum of the squared elements of a correlation matrix or the Fisher z scoreequivalents is distributed as chi square under the null hypothesis that the values are zero(ie elements of the identity matrix) This is particularly useful for examining whethercorrelations in a single matrix differ from zero or for comparing two matrices Althoughobvious cortest can be used to test whether the satact data matrix produces non-zerocorrelations (it does) This is a much more appropriate test when testing whether a residualmatrix differs from zero

                gt cortest(satact)

                33

                Tests of correlation matrices

                Callcortest(R1 = satact)

                Chi Square value 132542 with df = 15 with probability lt 18e-273

                36 Polychoric tetrachoric polyserial and biserial correlations

                The Pearson correlation of dichotomous data is also known as the φ coefficient If thedata eg ability items are thought to represent an underlying continuous although latentvariable the φ will underestimate the value of the Pearson applied to these latent variablesOne solution to this problem is to use the tetrachoric correlation which is based uponthe assumption of a bivariate normal distribution that has been cut at certain points Thedrawtetra function demonstrates the process (Figure 14) This is also shown in termsof dichotomizing the bivariate normal density function using the drawcor function (Fig-ure 15) A simple generalization of this to the case of the multiple cuts is the polychoric

                correlation

                Other estimated correlations based upon the assumption of bivariate normality with cutpoints include the biserial and polyserial correlation

                If the data are a mix of continuous polytomous and dichotomous variables the mixedcor

                function will calculate the appropriate mixture of Pearson polychoric tetrachoric biserialand polyserial correlations

                The correlation matrix resulting from a number of tetrachoric or polychoric correlationmatrix sometimes will not be positive semi-definite This will sometimes happen if thecorrelation matrix is formed by using pair-wise deletion of cases The corsmooth functionwill adjust the smallest eigen values of the correlation matrix to make them positive rescaleall of them to sum to the number of variables and produce aldquosmoothedrdquocorrelation matrixAn example of this problem is a data set of burt which probably had a typo in the originalcorrelation matrix Smoothing the matrix corrects this problem

                4 Multilevel modeling

                Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular use

                34

                gt drawtetra()

                minus3 minus2 minus1 0 1 2 3

                minus3

                minus2

                minus1

                01

                23

                Y rho = 05phi = 033

                X gt τY gt Τ

                X lt τY gt Τ

                X gt τY lt Τ

                X lt τY lt Τ

                x

                dnor

                m(x

                )

                X gt τ

                τ

                x1

                Y gt Τ

                Τ

                Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values

                35

                gt drawcor(expand=20cuts=c(00))

                xy

                z

                Bivariate density rho = 05

                Figure 15 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values It isfound (laboriously) by optimizing the fit of the bivariate normal for various values of thecorrelation to the observed cell frequencies

                36

                is the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups

                41 Decomposing data into within and between level correlations usingstatsBy

                There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler functionto give some of the basic descriptive statistics for two level models

                This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package

                rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)

                where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means

                42 Generating and displaying multilevel data

                withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a betweengroup correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data

                simmultilevel will generate simulated data with a multilevel structure

                The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary

                function specifying the variable of interest

                37

                Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )

                43 Factor analysis by groups

                Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output

                sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)

                faBy(sbnfactors=5) find the 5 factor solution for each education level

                5 Multiple Regression mediation moderation and set cor-relations

                The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices

                51 Multiple regression from data or correlation matrices

                The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstonecorrelation matrix and find the multiple correlation of the last five variables as a functionof the first 4

                gt setCor(y = 59x=14data=Thurstone)

                Call setCor(y = 59 x = 14 data = Thurstone)

                Multiple Regression from matrix input

                Beta weights

                FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

                Sentences 009 007 025 021 020

                Vocabulary 009 017 009 016 -002

                SentCompletion 002 005 004 021 008

                FirstLetters 058 045 021 008 031

                38

                Multiple R

                FourLetterWords Suffixes LetterSeries Pedigrees

                069 063 050 058

                LetterGroup

                048

                multiple R2

                FourLetterWords Suffixes LetterSeries Pedigrees

                048 040 025 034

                LetterGroup

                023

                Multiple Inflation Factor (VIF) = 1(1-SMC) =

                Sentences Vocabulary SentCompletion FirstLetters

                369 388 300 135

                Unweighted multiple R

                FourLetterWords Suffixes LetterSeries Pedigrees

                059 058 049 058

                LetterGroup

                045

                Unweighted multiple R2

                FourLetterWords Suffixes LetterSeries Pedigrees

                034 034 024 033

                LetterGroup

                020

                Various estimates of between set correlations

                Squared Canonical Correlations

                [1] 06280 01478 00076 00049

                Average squared canonical correlation = 02

                Cohens Set Correlation R2 = 069

                Unweighted correlation between the two sets = 073

                By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed

                gt sc lt- setCor(y = 59x=34data=Thurstonez=12)

                Call setCor(y = 59 x = 34 data = Thurstone z = 12)

                Multiple Regression from matrix input

                Beta weights

                FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

                SentCompletion 002 005 004 021 008

                FirstLetters 058 045 021 008 031

                Multiple R

                FourLetterWords Suffixes LetterSeries Pedigrees

                058 046 021 018

                LetterGroup

                030

                39

                multiple R2

                FourLetterWords Suffixes LetterSeries Pedigrees

                0331 0210 0043 0032

                LetterGroup

                0092

                Multiple Inflation Factor (VIF) = 1(1-SMC) =

                SentCompletion FirstLetters

                102 102

                Unweighted multiple R

                FourLetterWords Suffixes LetterSeries Pedigrees

                044 035 017 014

                LetterGroup

                026

                Unweighted multiple R2

                FourLetterWords Suffixes LetterSeries Pedigrees

                019 012 003 002

                LetterGroup

                007

                Various estimates of between set correlations

                Squared Canonical Correlations

                [1] 0405 0023

                Average squared canonical correlation = 021

                Cohens Set Correlation R2 = 042

                Unweighted correlation between the two sets = 048

                gt round(sc$residual2)

                FourLetterWords Suffixes LetterSeries Pedigrees

                FourLetterWords 052 011 009 006

                Suffixes 011 060 -001 001

                LetterSeries 009 -001 075 028

                Pedigrees 006 001 028 066

                LetterGroup 013 003 037 020

                LetterGroup

                FourLetterWords 013

                Suffixes 003

                LetterSeries 037

                Pedigrees 020

                LetterGroup 077

                52 Mediation and Moderation analysis

                Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping

                40

                Consider the example from Preacher and Hayes (2004) as analyzed using the mediate

                function and the subsequent graphic from mediatediagram The data are found in theexample for mediate

                Call mediate(y = SATIS x = THERAPY m = ATTRIB data = sobel)

                The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB

                Total Direct effect(c) of THERAPY on SATIS = 076 SE = 031 t direct = 25 with probability = 0019

                Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t direct = 135 with probability = 019

                Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033

                Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 004 Upper CI = 069

                R2 of model = 031

                To see the longer output specify short = FALSE in the print statement

                Full output

                Total effect estimates (c)

                SATIS se t Prob

                THERAPY 076 031 25 00186

                Direct effect estimates (c)SATIS se t Prob

                THERAPY 043 032 135 0190

                ATTRIB 040 018 223 0034

                a effect estimates

                THERAPY se t Prob

                ATTRIB 082 03 274 00106

                b effect estimates

                SATIS se t Prob

                ATTRIB 04 018 223 0034

                ab effect estimates

                SATIS boot sd lower upper

                THERAPY 033 032 017 004 069

                bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables

                setCor(y = c( SATV SATQ) x = c(education age ) data = satact std=TRUE)

                bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap

                mediate(y = c( SATV ) x = c(education age ) m= ACT data =satactstd=TRUEniter=50)

                bull mediate will take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It then tests the mediation effect using a bootstrap The particular example is for demonstration purposes only and shows neithermoderation nor mediation The number of iterations for the boot strap was set to 50

                41

                gt mediatediagram(preacher)

                Mediation model

                THERAPY SATIS

                ATTRIB

                082

                c = 076

                c = 043

                04

                Figure 16 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram

                42

                gt preacher lt- setCor(1c(23)sobelstd=FALSE)

                gt setCordiagram(preacher)

                Regression Models

                THERAPY

                ATTRIB

                SATIS

                043

                04

                021

                Figure 17 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure

                43

                for speed The default number of boot straps is 5000

                53 Set Correlation

                An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor

                function Set correlation is

                R2 = 1minusn

                prodi=1

                (1minusλi)

                where λi is the ith eigen value of the eigen value decomposition of the matrix

                R = Rminus1xx RxyRminus1

                xx Rminus1xy

                Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate

                setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data

                Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things tonotice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied

                For this example the analysis is done on the correlation matrix rather than the rawdata

                gt C lt- cov(satactuse=pairwise)

                gt model1 lt- lm(ACT~ gender + education + age data=satact)

                gt summary(model1)

                Call

                lm(formula = ACT ~ gender + education + age data = satact)

                Residuals

                44

                Call mediate(y = c(SATQ) x = c(ACT) m = education data = satact

                mod = gender niter = 50 std = TRUE)

                The DV (Y) was SATQ The IV (X) was ACT gender ACTXgndr The mediating variable(s) = education

                Total Direct effect(c) of ACT on SATQ = 058 SE = 003 t direct = 1925 with probability = 0

                Direct effect (c) of ACT on SATQ removing education = 059 SE = 003 t direct = 1926 with probability = 0

                Indirect effect (ab) of ACT on SATQ through education = -001

                Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -002 Upper CI = 0

                Total Direct effect(c) of gender on SATQ = -014 SE = 003 t direct = -478 with probability = 21e-06

                Direct effect (c) of gender on NA removing education = -014 SE = 003 t direct = -463 with probability = 44e-06

                Indirect effect (ab) of gender on SATQ through education = 0

                Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -001 Upper CI = 0

                Total Direct effect(c) of ACTXgndr on SATQ = 0 SE = 003 t direct = 002 with probability = 099

                Direct effect (c) of ACTXgndr on NA removing education = 0 SE = 003 t direct = 001 with probability = 099

                Indirect effect (ab) of ACTXgndr on SATQ through education = 0

                Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = 0 Upper CI = 0

                R2 of model = 037

                To see the longer output specify short = FALSE in the print statement

                Full output

                Total effect estimates (c)

                SATQ se t Prob

                ACT 058 003 1925 000e+00

                gender -014 003 -478 210e-06

                ACTXgndr 000 003 002 985e-01

                Direct effect estimates (c)SATQ se t Prob

                ACT 059 003 1926 000e+00

                gender -014 003 -463 437e-06

                ACTXgndr 000 003 001 992e-01

                a effect estimates

                education se t Prob

                ACT 016 004 422 277e-05

                gender 009 004 250 128e-02

                ACTXgndr -001 004 -015 883e-01

                b effect estimates

                SATQ se t Prob

                education -004 003 -145 0147

                ab effect estimates

                SATQ boot sd lower upper

                ACT -001 -001 001 0 0

                gender 000 000 000 0 0

                ACTXgndr 000 000 000 0 0

                Moderation model

                ACT

                gender

                ACTXgndr

                SATQ

                education016 c = 058

                c = 059

                009 c = minus014

                c = minus014

                minus001 c = 0

                c = 0

                minus004

                minus004

                minus007

                002

                Figure 18 Moderated multiple regression requires the raw data

                45

                Min 1Q Median 3Q Max

                -252458 -32133 07769 35921 92630

                Coefficients

                Estimate Std Error t value Pr(gt|t|)

                (Intercept) 2741706 082140 33378 lt 2e-16

                gender -048606 037984 -1280 020110

                education 047890 015235 3143 000174

                age 001623 002278 0712 047650

                ---

                Signif codes 0 0001 001 005 01 1

                Residual standard error 4768 on 696 degrees of freedom

                Multiple R-squared 00272 Adjusted R-squared 002301

                F-statistic 6487 on 3 and 696 DF p-value 00002476

                Compare this with the output from setCor

                gt compare with sector

                gt setCor(c(46)c(13)C nobs=700)

                Call setCor(y = c(46) x = c(13) data = C nobs = 700)

                Multiple Regression from matrix input

                Beta weights

                ACT SATV SATQ

                gender -005 -003 -018

                education 014 010 010

                age 003 -010 -009

                Multiple R

                ACT SATV SATQ

                016 010 019

                multiple R2

                ACT SATV SATQ

                00272 00096 00359

                Multiple Inflation Factor (VIF) = 1(1-SMC) =

                gender education age

                101 145 144

                Unweighted multiple R

                ACT SATV SATQ

                015 005 011

                Unweighted multiple R2

                ACT SATV SATQ

                002 000 001

                SE of Beta weights

                ACT SATV SATQ

                gender 018 429 434

                education 022 513 518

                age 022 511 516

                t of Beta Weights

                ACT SATV SATQ

                gender -027 -001 -004

                education 065 002 002

                46

                age 015 -002 -002

                Probability of t lt

                ACT SATV SATQ

                gender 079 099 097

                education 051 098 098

                age 088 098 099

                Shrunken R2

                ACT SATV SATQ

                00230 00054 00317

                Standard Error of R2

                ACT SATV SATQ

                00120 00073 00137

                F

                ACT SATV SATQ

                649 226 863

                Probability of F lt

                ACT SATV SATQ

                248e-04 808e-02 124e-05

                degrees of freedom of regression

                [1] 3 696

                Various estimates of between set correlations

                Squared Canonical Correlations

                [1] 0050 0033 0008

                Chisq of canonical correlations

                [1] 358 231 56

                Average squared canonical correlation = 003

                Cohens Set Correlation R2 = 009

                Shrunken Set Correlation R2 = 008

                F and df of Cohens Set Correlation 726 9 168186

                Unweighted correlation between the two sets = 001

                Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

                6 Converting output to APA style tables using LATEX

                Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenient

                47

                LATEXoutput and finally df2latex converts a generic data frame to LATEX

                An example of converting the output from fa to LATEXappears in Table 2

                Table 2 fa2latexA factor analysis table from the psych package in R

                Variable MR1 MR2 MR3 h2 u2 com

                Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

                SS loadings 264 186 15

                MR1 100 059 054MR2 059 100 052MR3 054 052 100

                48

                7 Miscellaneous functions

                A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

                blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

                df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

                scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

                cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

                cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

                dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

                fisherz Convert a correlation to the corresponding Fisher z score

                geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

                ICC and cohenkappa are typically used to find the reliability for raters

                headtail combines the head and tail functions to show the first and last lines of a dataset or output

                topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

                mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

                prep finds the probability of replication for an F t or r and estimate effect size

                partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

                rangeCorrection will correct correlations for restriction of range

                reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

                49

                superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

                8 Data sets

                A number of data sets for demonstrating psychometric techniques are included in thepsych package These include six data sets showing a hierarchical factor structure (fivecognitive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2and one from health psychology Reise) One of these (Thurstone) is used as an examplein the sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) or 13 personality inventory scores (epibfi) and14 multiple choice iq items (iqitems) The vegetables example has paired comparisonpreferences for 9 vegetables This is an example of Thurstonian scaling used by Guilford(1954) and Nunnally (1967) Other data sets include cubits peas and heights fromGalton

                Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

                bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

                satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

                epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

                50

                iq 14 multiple choice ability items were included as part of the Synthetic Aperture Person-ality Assessment (SAPA) web based personality assessment project The data from1000 subjects are included here as a demonstration set for scoring multiple choiceinventories and doing basic item statistics

                galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to ntroduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

                Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

                miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

                9 Development version and a users guide

                The most recent development version is available as a source file at the repository main-tained at httppersonality-projectorgr That version will have removed the mostrecently discovered bugs (but perhaps introduced other yet to be discovered ones) Todownload that version go to the repository httppersonality-projectorgrsrc

                contrib and wander around For a Mac this version can be installed directly using theldquoother repositoryrdquo option in the package installer For a PC the zip file for the most recentrelease has been created using the win-builder facility at CRAN The development releasefor the Mac is usually several weeks ahead of the PC development version

                Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httppersonality-projectorgrpsych_manualpdf

                News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

                gt news(Version gt 170package=psych)

                51

                10 Psychometric Theory

                The psych package has been developed to help psychologists do basic research Many ofthe functions were developed to supplement a book (httppersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep)More information about the use of some of the functions may be found in the book

                For more extensive discussion of the use of psych in particular and R in general consulthttppersonality-projectorgrrguidehtml A short guide to R

                11 SessionInfo

                This document was prepared using the following settings

                gt sessionInfo()

                R Under development (unstable) (2017-03-05 r72309)

                Platform x86_64-apple-darwin1340 (64-bit)

                Running under macOS Sierra 10124

                Matrix products default

                BLAS LibraryFrameworksRframeworkVersions34ResourcesliblibRblas0dylib

                LAPACK LibraryFrameworksRframeworkVersions34ResourcesliblibRlapackdylib

                locale

                [1] C

                attached base packages

                [1] stats graphics grDevices utils datasets methods base

                other attached packages

                [1] psych_17421

                loaded via a namespace (and not attached)

                [1] compiler_340 parallel_340 tools_340 foreign_08-67

                [5] KernSmooth_223-15 nlme_31-131 mnormt_15-4 grid_340

                [9] lattice_020-34

                52

                References

                Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

                Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

                Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

                Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

                Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

                Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

                Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

                Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

                Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

                Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

                Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

                Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

                Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

                Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

                Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

                53

                Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

                Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

                Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

                Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

                Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

                Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

                Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

                Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

                Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

                Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

                MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

                Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

                McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

                Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

                Nunnally J C (1967) Psychometric theory McGraw-Hill New York

                54

                Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

                3rd edition

                Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

                Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

                Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

                Revelle W (2015) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 158

                Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

                Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

                Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

                Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

                Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook ofIndividual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

                Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

                Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

                Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

                Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bang

                55

                for the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

                Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

                Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

                Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

                Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

                Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

                Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

                Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

                Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

                Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

                Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

                Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

                56

                Index

                affect 14 24alpha 5 6

                Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26bifactor 6biserial 13 34blockrandom 49burt 34

                char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49circular statistics 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49ctv 7cubits 50

                densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49diagram 8drawcor 34drawtetra 34dummycode 13

                dynamite plot 19

                edit 3epibfi 50error bars 19errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23

                fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factor analysis 6factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49

                galton 51generalized least squares 6geometricmean 49GPArotation 7guttman 6

                harmonicmean 49head 49headtail 49heights 50hetdiagram 7Hmisc 28Holzinger 50

                57

                ICC 6 49iclust 6iclustdiagram 7Index 49introduction to psychometric theory with ap-

                plications in R 7iqitems 50irtfa 6 47irt2latex 47 49

                KnitR 47

                lavaan 38library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27lowess 14

                makekeys 14MAP 6mardia 49maximum likelihood 6mediate 4 41 42mediatediagram 41minimum residual 6mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevel 37multilevelreliability 6multiple regression 38

                nfactors 6nlme 37

                omega 6 7outlier 3 11 12

                padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7principal axis 6psych 3 5ndash8 28 47 49ndash52

                R functionaffect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50

                58

                densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13edit 3epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49head 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47

                irt2latex 47 49library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7psych 51psych package

                affect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26

                59

                biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49corsmooth 34cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7

                fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47irt2latex 47 49lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12prep 49pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34

                60

                polyserial 34principal 5ndash7psych 51rtest 28rangeCorrection 49readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

                rtest 28

                rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50table 19tail 49tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

                R package

                61

                ctv 7GPArotation 7Hmisc 28KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash8 28 47 49ndash52Rgraphviz 8sem 7 50stats 28Sweave 47xtable 47

                rtest 28rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49Rgraphviz 8

                SAPA 26 50 51satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43sem 7 50set correlation 44setcor 49setCor 4 38 41 44 46 47simmultilevel 37

                spider 13stars 14stats 28StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50Sweave 47

                table 19tail 49tetrachoric 6 34Thurstone 28 38 50Thurstone33 50topBottom 49

                vegetables 50 51violinBy 14 18vss 5 6

                weighted least squares 6withinBetween 37

                xtable 47

                62

                • Jump starting the psych packagendasha guide for the impatient
                • Psychometric functions are summarized in the second vignette
                • Overview of this and related documents
                • Getting started
                • Basic data analysis
                  • Getting the data by using readfile
                  • Data input from the clipboard
                  • Basic descriptive statistics
                    • Outlier detection using outlier
                    • Basic data cleaning using scrub
                    • Recoding categorical variables into dummy coded variables
                      • Simple descriptive graphics
                        • Scatter Plot Matrices
                        • Density or violin plots
                        • Means and error bars
                        • Error bars for tabular data
                        • Two dimensional displays of means and errors
                        • Back to back histograms
                        • Correlational structure
                        • Heatmap displays of correlational structure
                          • Testing correlations
                          • Polychoric tetrachoric polyserial and biserial correlations
                            • Multilevel modeling
                              • Decomposing data into within and between level correlations using statsBy
                              • Generating and displaying multilevel data
                              • Factor analysis by groups
                                • Multiple Regression mediation moderation and set correlations
                                  • Multiple regression from data or correlation matrices
                                  • Mediation and Moderation analysis
                                  • Set Correlation
                                    • Converting output to APA style tables using LaTeX
                                    • Miscellaneous functions
                                    • Data sets
                                    • Development version and a users guide
                                    • Psychometric Theory
                                    • SessionInfo

                  separated files) normal txt or text files data or dat files will be read as well These areassumed to have a header row of variable labels (header=TRUE) If the data do not havea header row you must specify readfile(header=FALSE)

                  To read SPSS files and to keep the value labels specify usevaluelabels=TRUE

                  myspss lt- readfile(usevaluelabels=TRUE) this will keep the value labels for sav files

                  32 Data input from the clipboard

                  There are of course many ways to enter data into R Reading from a local file usingreadtable is perhaps the most preferred However many users will enter their datain a text editor or spreadsheet program and then want to copy and paste into R Thismay be done by using readtable and specifying the input file as ldquoclipboardrdquo (PCs) orldquopipe(pbpaste)rdquo (Macs) Alternatively the readclipboard set of functions are perhapsmore user friendly

                  readclipboard is the base function for reading data from the clipboard

                  readclipboardcsv for reading text that is comma delimited

                  readclipboardtab for reading text that is tab delimited (eg copied directly from anExcel file)

                  readclipboardlower for reading input of a lower triangular matrix with or without adiagonal The resulting object is a square matrix

                  readclipboardupper for reading input of an upper triangular matrix

                  readclipboardfwf for reading in fixed width fields (some very old data sets)

                  For example given a data set copied to the clipboard from a spreadsheet just enter thecommand

                  mydata lt- readclipboard()

                  This will work if every data field has a value and even missing data are given some values(eg NA or -999) If the data were entered in a spreadsheet and the missing valueswere just empty cells then the data should be read in as a tab delimited or by using thereadclipboardtab function

                  gt mydata lt- readclipboard(sep=t) define the tab option or

                  gt mytabdata lt- readclipboardtab() just use the alternative function

                  For the case of data in fixed width fields (some old data sets tend to have this format)copy to the clipboard and then specify the width of each field (in the example below the

                  9

                  first variable is 5 columns the second is 2 columns the next 5 are 1 column the last 4 are3 columns)

                  gt mydata lt- readclipboardfwf(widths=c(52rep(15)rep(34))

                  33 Basic descriptive statistics

                  Once the data are read in then describe or describeBy will provide basic descriptivestatistics arranged in a data frame format Consider the data set satact which in-cludes data from 700 web based participants on 3 demographic variables and 3 abilitymeasures

                  describe reports means standard deviations medians min max range skew kurtosisand standard errors for integer or real data Non-numeric data although the statisticsare meaningless will be treated as if numeric (based upon the categorical coding ofthe data) and will be flagged with an

                  describeBy reports descriptive statistics broken down by some categorizing variable (eggender age etc)

                  gt library(psych)

                  gt data(satact)

                  gt describe(satact) basic descriptive statistics

                  vars n mean sd median trimmed mad min max range skew

                  gender 1 700 165 048 2 168 000 1 2 1 -061

                  education 2 700 316 143 3 331 148 0 5 5 -068

                  age 3 700 2559 950 22 2386 593 13 65 52 164

                  ACT 4 700 2855 482 29 2884 445 3 36 33 -066

                  SATV 5 700 61223 11290 620 61945 11861 200 800 600 -064

                  SATQ 6 687 61022 11564 620 61725 11861 200 800 600 -059

                  kurtosis se

                  gender -162 002

                  education -007 005

                  age 242 036

                  ACT 053 018

                  SATV 033 427

                  SATQ -002 441

                  These data may then be analyzed by groups defined in a logical statement or by some othervariable Eg break down the descriptive data for males or females These descriptivedata can also be seen graphically using the errorbarsby function (Figure 6) By settingskew=FALSE and ranges=FALSE the output is limited to the most basic statistics

                  gt basic descriptive statistics by a grouping variable

                  gt describeBy(satactsatact$genderskew=FALSEranges=FALSE)

                  Descriptive statistics by group

                  group 1

                  vars n mean sd se

                  gender 1 247 100 000 000

                  10

                  education 2 247 300 154 010

                  age 3 247 2586 974 062

                  ACT 4 247 2879 506 032

                  SATV 5 247 61511 11416 726

                  SATQ 6 245 63587 11602 741

                  ------------------------------------------------------------

                  group 2

                  vars n mean sd se

                  gender 1 453 200 000 000

                  education 2 453 326 135 006

                  age 3 453 2545 937 044

                  ACT 4 453 2842 469 022

                  SATV 5 453 61066 11231 528

                  SATQ 6 442 59600 11307 538

                  The output from the describeBy function can be forced into a matrix form for easy analysisby other programs In addition describeBy can group by several grouping variables at thesame time

                  gt samat lt- describeBy(satactlist(satact$gendersatact$education)

                  + skew=FALSEranges=FALSEmat=TRUE)

                  gt headTail(samat)

                  item group1 group2 vars n mean sd se

                  gender1 1 1 0 1 27 1 0 0

                  gender2 2 2 0 1 30 2 0 0

                  gender3 3 1 1 1 20 1 0 0

                  gender4 4 2 1 1 25 2 0 0

                  ltNAgt ltNAgt ltNAgt

                  SATQ9 69 1 4 6 51 6359 10412 1458

                  SATQ10 70 2 4 6 86 59759 10624 1146

                  SATQ11 71 1 5 6 46 65783 8961 1321

                  SATQ12 72 2 5 6 93 60672 10555 1095

                  331 Outlier detection using outlier

                  One way to detect unusual data is to consider how far each data point is from the mul-tivariate centroid of the data That is find the squared Mahalanobis distance for eachdata point and then compare these to the expected values of χ2 This produces a Q-Q(quantle-quantile) plot with the n most extreme data points labeled (Figure 1) The outliervalues are in the vector d2

                  332 Basic data cleaning using scrub

                  If after describing the data it is apparent that there were data entry errors that need tobe globally replaced with NA or only certain ranges of data will be analyzed the data canbe ldquocleanedrdquo using the scrub function

                  Consider a data set of 10 rows of 12 columns with values from 1 - 120 All values of columns

                  11

                  gt png( outlierpng )

                  gt d2 lt- outlier(satactcex=8)

                  gt devoff()

                  null device

                  1

                  Figure 1 Using the outlier function to graphically show outliers The y axis is theMahalanobis D2 the X axis is the distribution of χ2 for the same number of degrees offreedom The outliers detected here may be shown graphically using pairspanels (see2 and may be found by sorting d2

                  12

                  3 - 5 that are less than 30 40 or 50 respectively or greater than 70 in any of the threecolumns will be replaced with NA In addition any value exactly equal to 45 will be setto NA (max and isvalue are set to one value here but they could be a different value forevery column)

                  gt x lt- matrix(1120ncol=10byrow=TRUE)

                  gt colnames(x) lt- paste(V110sep=)gt newx lt- scrub(x35min=c(304050)max=70isvalue=45newvalue=NA)

                  gt newx

                  V1 V2 V3 V4 V5 V6 V7 V8 V9 V10

                  [1] 1 2 NA NA NA 6 7 8 9 10

                  [2] 11 12 NA NA NA 16 17 18 19 20

                  [3] 21 22 NA NA NA 26 27 28 29 30

                  [4] 31 32 33 NA NA 36 37 38 39 40

                  [5] 41 42 43 44 NA 46 47 48 49 50

                  [6] 51 52 53 54 55 56 57 58 59 60

                  [7] 61 62 63 64 65 66 67 68 69 70

                  [8] 71 72 NA NA NA 76 77 78 79 80

                  [9] 81 82 NA NA NA 86 87 88 89 90

                  [10] 91 92 NA NA NA 96 97 98 99 100

                  [11] 101 102 NA NA NA 106 107 108 109 110

                  [12] 111 112 NA NA NA 116 117 118 119 120

                  Note that the number of subjects for those columns has decreased and the minimums havegone up but the maximums down Data cleaning and examination for outliers should be aroutine part of any data analysis

                  333 Recoding categorical variables into dummy coded variables

                  Sometimes categorical variables (eg college major occupation ethnicity) are to be ana-lyzed using correlation or regression To do this one can form ldquodummy codesrdquo which aremerely binary variables for each category This may be done using dummycode Subse-quent analyses using these dummy coded variables may be using biserial or point biserial(regular Pearson r) to show effect sizes and may be plotted in eg spider plots

                  Alternatively sometimes data were coded originally as categorical (MaleFemale HighSchool some College in college etc) and you want to convert these columns of data tonumeric This is done by char2numeric

                  34 Simple descriptive graphics

                  Graphic descriptions of data are very helpful both for understanding the data as well ascommunicating important results Scatter Plot Matrices (SPLOMS) using the pairspanelsfunction are useful ways to look for strange effects involving outliers and non-linearitieserrorbarsby will show group means with 95 confidence boundaries By default er-rorbarsby and errorbars will show ldquocats eyesrdquo to graphically show the confidence

                  13

                  limits (Figure 6) This may be turned off by specifying eyes=FALSE densityBy or vio-

                  linBy may be used to show the distribution of the data in ldquoviolinrdquo plots (Figure 5) (Theseare sometimes called ldquolava-lamprdquo plots)

                  341 Scatter Plot Matrices

                  Scatter Plot Matrices (SPLOMS) are very useful for describing the data The pairspanelsfunction adapted from the help menu for the pairs function produces xy scatter plots ofeach pair of variables below the diagonal shows the histogram of each variable on thediagonal and shows the lowess locally fit regression line as well An ellipse around themean with the axis length reflecting one standard deviation of the x and y variables is alsodrawn The x axis in each scatter plot represents the column variable the y axis the rowvariable (Figure 2) When plotting many subjects it is both faster and cleaner to set theplot character (pch) to be rsquorsquo (See Figure 2 for an example)

                  pairspanels will show the pairwise scatter plots of all the variables as well as his-tograms locally smoothed regressions and the Pearson correlation When plottingmany data points (as in the case of the satact data it is possible to specify that theplot character is a period to get a somewhat cleaner graphic However in this figureto show the outliers we use colors and a larger plot character If we want to indicatersquosignificancersquo of the correlations by the conventional use of rsquomagic astricksrsquo we can setthe stars=TRUE option

                  Another example of pairspanels is to show differences between experimental groupsConsider the data in the affect data set The scores reflect post test scores on positiveand negative affect and energetic and tense arousal The colors show the results for fourmovie conditions depressing frightening movie neutral and a comedy

                  Yet another demonstration of pairspanels is useful when you have many subjects andwant to show the density of the distributions To do this we will use the makekeys

                  and scoreItems functions (discussed in the second vignette) to create scales measuringEnergetic Arousal Tense Arousal Positive Affect and Negative Affect (see the msq helpfile) We then show a pairspanels scatter plot matrix where we smooth the data pointsand show the density of the distribution by color

                  342 Density or violin plots

                  Graphical presentation of data may be shown using box plots to show the median and 25thand 75th percentiles A powerful alternative is to show the density distribution using theviolinBy function (Figure 5)

                  14

                  gt png( pairspanelspng )

                  gt satd2 lt- dataframe(satactd2) combine the d2 statistics from before with the satact dataframe

                  gt pairspanels(satd2bg=c(yellowblue)[(d2 gt 25)+1]pch=21stars=TRUE)

                  gt devoff()

                  null device

                  1

                  Figure 2 Using the pairspanels function to graphically show relationships The x axisin each scatter plot represents the column variable the y axis the row variable Note theextreme outlier for the ACT If the plot character were set to a period (pch=rsquorsquo) it wouldmake a cleaner graphic but in to show the outliers in color we use the plot characters 21and 22

                  15

                  gt png(affectpng)gt pairspanels(affect[1417]bg=c(redblackwhiteblue)[affect$Film]pch=21

                  + main=Affect varies by movies )

                  gt devoff()

                  null device

                  1

                  Figure 3 Using the pairspanels function to graphically show relationships The x axis ineach scatter plot represents the column variable the y axis the row variable The coloringrepresent four different movie conditions

                  16

                  gt keys lt- makekeys(msq[175]list(

                  + EA = c(active energetic vigorous wakeful wideawake fullofpep

                  + lively -sleepy -tired -drowsy)

                  + TA =c(intense jittery fearful tense clutchedup -quiet -still

                  + -placid -calm -atrest)

                  + PA =c(active excited strong inspired determined attentive

                  + interested enthusiastic proud alert)

                  + NAf =c(jittery nervous scared afraid guilty ashamed distressed

                  + upset hostile irritable )) )

                  gt scores lt- scoreItems(keysmsq[175])

                  gt png(msqpng)gt pairspanels(scores$scoressmoother=TRUE

                  + main =Density distributions of four measures of affect )

                  gt devoff()

                  null device

                  1

                  Figure 4 Using the pairspanels function to graphically show relationships The x axis ineach scatter plot represents the column variable the y axis the row variable The variablesare four measures of motivational state for 3896 participants Each scale is the averagescore of 10 items measuring motivational state Compare this a plot with smoother set toFALSE

                  17

                  gt data(satact)

                  gt violinBy(satact[56]satact$gendergrpname=c(M F)main=Density Plot by gender for SAT V and Q)

                  Density Plot by gender for SAT V and Q

                  Obs

                  erve

                  d

                  SATV M SATV F SATQ M SATQ F

                  200

                  300

                  400

                  500

                  600

                  700

                  800

                  Figure 5 Using the violinBy function to show the distribution of SAT V and Q for malesand females The plot shows the medians and 25th and 75th percentiles as well as theentire range and the density distribution

                  18

                  343 Means and error bars

                  Additional descriptive graphics include the ability to draw error bars on sets of data aswell as to draw error bars in both the x and y directions for paired data These are thefunctions errorbars errorbarsby errorbarstab and errorcrosses

                  errorbars show the 95 confidence intervals for each variable in a data frame or ma-trix These errors are based upon normal theory and the standard errors of the meanAlternative options include +- one standard deviation or 1 standard error If thedata are repeated measures the error bars will be reflect the between variable cor-relations By default the confidence intervals are displayed using a ldquocats eyesrdquo plotwhich emphasizes the distribution of confidence within the confidence interval

                  errorbarsby does the same but grouping the data by some condition

                  errorbarstab draws bar graphs from tabular data with error bars based upon thestandard error of proportion (σp =

                  radicpqN)

                  errorcrosses draw the confidence intervals for an x set and a y set of the same size

                  The use of the errorbarsby function allows for graphic comparisons of different groups(see Figure 6) Five personality measures are shown as a function of high versus low scoreson a ldquolierdquo scale People with higher lie scores tend to report being more agreeable consci-entious and less neurotic than people with lower lie scores The error bars are based uponnormal theory and thus are symmetric rather than reflect any skewing in the data

                  Although not recommended it is possible to use the errorbars function to draw bargraphs with associated error bars (This kind of dynamite plot (Figure 8) can be verymisleading in that the scale is arbitrary Go to a discussion of the problems in presentingdata this way at httpemdbolkerwikidotcomblogdynamite In the example shownnote that the graph starts at 0 although is out of the range This is a function of usingbars which always are assumed to start at zero Consider other ways of showing yourdata

                  344 Error bars for tabular data

                  However it is sometimes useful to show error bars for tabular data either found by thetable function or just directly input These may be found using the errorbarstab

                  function

                  19

                  gt data(epibfi)

                  gt errorbarsby(epibfi[610]epibfi$epilielt4)

                  095 confidence limits

                  Independent Variable

                  Dep

                  ende

                  nt V

                  aria

                  ble

                  bfagree bfcon bfext bfneur bfopen

                  050

                  100

                  150

                  Figure 6 Using the errorbarsby function shows that self reported personality scales onthe Big Five Inventory vary as a function of the Lie scale on the EPI The ldquocats eyesrdquo showthe distribution of the confidence

                  20

                  gt errorbarsby(satact[56]satact$genderbars=TRUE

                  + labels=c(MaleFemale)ylab=SAT scorexlab=)

                  Male Female

                  095 confidence limits

                  SAT

                  sco

                  re

                  200

                  300

                  400

                  500

                  600

                  700

                  800

                  200

                  300

                  400

                  500

                  600

                  700

                  800

                  Figure 7 A ldquoDynamite plotrdquo of SAT scores as a function of gender is one way of misleadingthe reader By using a bar graph the range of scores is ignored Bar graphs start from 0

                  21

                  gt T lt- with(satacttable(gendereducation))

                  gt rownames(T) lt- c(MF)

                  gt errorbarstab(Tway=bothylab=Proportion of Education Levelxlab=Level of Education

                  + main=Proportion of sample by education level)

                  Proportion of sample by education level

                  Level of Education

                  Pro

                  port

                  ion

                  of E

                  duca

                  tion

                  Leve

                  l

                  000

                  005

                  010

                  015

                  020

                  025

                  030

                  M 0 M 1 M 2 M 3 M 4 M 5

                  000

                  005

                  010

                  015

                  020

                  025

                  030

                  Figure 8 The proportion of each education level that is Male or Female By using theway=rdquobothrdquo option the percentages and errors are based upon the grand total Alterna-tively way=rdquocolumnsrdquo finds column wise percentages way=rdquorowsrdquo finds rowwise percent-ages The data can be converted to percentages (as shown) or by total count (raw=TRUE)The function invisibly returns the probabilities and standard errors See the help menu foran example of entering the data as a dataframe

                  22

                  345 Two dimensional displays of means and errors

                  Yet another way to display data for different conditions is to use the errorCrosses func-tion For instance the effect of various movies on both ldquoEnergetic Arousalrdquo and ldquoTenseArousalrdquo can be seen in one graph and compared to the same movie manipulations onldquoPositive Affectrdquo and ldquoNegative Affectrdquo Note how Energetic Arousal is increased by threeof the movie manipulations but that Positive Affect increases following the Happy movieonly

                  23

                  gt op lt- par(mfrow=c(12))

                  gt data(affect)

                  gt colors lt- c(blackredwhiteblue)

                  gt films lt- c(SadHorrorNeutralHappy)

                  gt affectstats lt- errorCircles(EA2TA2data=affect[-c(120)]group=Filmlabels=films

                  + xlab=Energetic Arousal ylab=Tense Arousalylim=c(1022)xlim=c(820)pch=16

                  + cex=2colors=colors main = Movies effect on arousal)gt errorCircles(PA2NA2data=affectstatslabels=filmsxlab=Positive Affect

                  + ylab=Negative Affect pch=16cex=2colors=colors main =Movies effect on affect)

                  gt op lt- par(mfrow=c(11))

                  8 12 16 20

                  1012

                  1416

                  1820

                  22

                  Movies effect on arousal

                  Energetic Arousal

                  Tens

                  e A

                  rous

                  al

                  SadHorror

                  NeutralHappy

                  6 8 10 12

                  24

                  68

                  10

                  Movies effect on affect

                  Positive Affect

                  Neg

                  ativ

                  e A

                  ffect

                  Sad

                  Horror

                  NeutralHappy

                  Figure 9 The use of the errorCircles function allows for two dimensional displays ofmeans and error bars The first call to errorCircles finds descriptive statistics for theaffect dataframe based upon the grouping variable of Film These data are returned andthen used by the second call which examines the effect of the same grouping variable upondifferent measures The size of the circles represent the relative sample sizes for each groupThe data are from the PMC lab and reported in Smillie et al (2012)

                  24

                  346 Back to back histograms

                  The bibars function summarize the characteristics of two groups (eg males and females)on a second variable (eg age) by drawing back to back histograms (see Figure 10)

                  25

                  data(bfi)gt png( bibarspng )

                  gt with(bfibibars(agegenderylab=Agemain=Age by males and females))

                  gt devoff()

                  null device

                  1

                  Figure 10 A bar plot of the age distribution for males and females shows the use ofbibars The data are males and females from 2800 cases collected using the SAPAprocedure and are available as part of the bfi data set

                  26

                  347 Correlational structure

                  There are many ways to display correlations Tabular displays are probably the mostcommon The output from the cor function in core R is a rectangular matrix lowerMat

                  will round this to (2) digits and then display as a lower off diagonal matrix lowerCor

                  calls cor with use=lsquopairwisersquo method=lsquopearsonrsquo as default values and returns (invisibly)the full correlation matrix and displays the lower off diagonal matrix

                  gt lowerCor(satact)

                  gendr edctn age ACT SATV SATQ

                  gender 100

                  education 009 100

                  age -002 055 100

                  ACT -004 015 011 100

                  SATV -002 005 -004 056 100

                  SATQ -017 003 -003 059 064 100

                  When comparing results from two different groups it is convenient to display them as onematrix with the results from one group below the diagonal and the other group above thediagonal Use lowerUpper to do this

                  gt female lt- subset(satactsatact$gender==2)

                  gt male lt- subset(satactsatact$gender==1)

                  gt lower lt- lowerCor(male[-1])

                  edctn age ACT SATV SATQ

                  education 100

                  age 061 100

                  ACT 016 015 100

                  SATV 002 -006 061 100

                  SATQ 008 004 060 068 100

                  gt upper lt- lowerCor(female[-1])

                  edctn age ACT SATV SATQ

                  education 100

                  age 052 100

                  ACT 016 008 100

                  SATV 007 -003 053 100

                  SATQ 003 -009 058 063 100

                  gt both lt- lowerUpper(lowerupper)

                  gt round(both2)

                  education age ACT SATV SATQ

                  education NA 052 016 007 003

                  age 061 NA 008 -003 -009

                  ACT 016 015 NA 053 058

                  SATV 002 -006 061 NA 063

                  SATQ 008 004 060 068 NA

                  It is also possible to compare two matrices by taking their differences and displaying one (be-low the diagonal) and the difference of the second from the first above the diagonal

                  27

                  gt diffs lt- lowerUpper(lowerupperdiff=TRUE)

                  gt round(diffs2)

                  education age ACT SATV SATQ

                  education NA 009 000 -005 005

                  age 061 NA 007 -003 013

                  ACT 016 015 NA 008 002

                  SATV 002 -006 061 NA 005

                  SATQ 008 004 060 068 NA

                  348 Heatmap displays of correlational structure

                  Perhaps a better way to see the structure in a correlation matrix is to display a heat mapof the correlations This is just a matrix color coded to represent the magnitude of thecorrelation This is useful when considering the number of factors in a data set Considerthe Thurstone data set which has a clear 3 factor solution (Figure 11) or a simulated dataset of 24 variables with a circumplex structure (Figure 12) The color coding representsa ldquoheat maprdquo of the correlations with darker shades of red representing stronger negativeand darker shades of blue stronger positive correlations As an option the value of thecorrelation can be shown

                  Yet another way to show structure is to use ldquospiderrdquo plots Particularly if variables areordered in some meaningful way (eg in a circumplex) a spider plot will show this structureeasily This is just a plot of the magnitude of the correlation as a radial line with lengthranging from 0 (for a correlation of -1) to 1 (for a correlation of 1) (See Figure 13)

                  35 Testing correlations

                  Correlations are wonderful descriptive statistics of the data but some people like to testwhether these correlations differ from zero or differ from each other The cortest func-tion (in the stats package) will test the significance of a single correlation and the rcorr

                  function in the Hmisc package will do this for many correlations In the psych packagethe corrtest function reports the correlation (Pearson Spearman or Kendall) betweenall variables in either one or two data frames or matrices as well as the number of obser-vations for each case and the (two-tailed) probability for each correlation Unfortunatelythese probability values have not been corrected for multiple comparisons and so shouldbe taken with a great deal of salt Thus in corrtest and corrp the raw probabilitiesare reported below the diagonal and the probabilities adjusted for multiple comparisonsusing (by default) the Holm correction are reported above the diagonal (Table 1) (See thepadjust function for a discussion of Holm (1979) and other corrections)

                  Testing the difference between any two correlations can be done using the rtest functionThe function actually does four different tests (based upon an article by Steiger (1980)

                  28

                  gt png(corplotpng)gt corPlot(Thurstonenumbers=TRUEupper=FALSEdiag=FALSEmain=9 cognitive variables from Thurstone)

                  gt devoff()

                  null device

                  1

                  Figure 11 The structure of correlation matrix can be seen more clearly if the variables aregrouped by factor and then the correlations are shown by color By using the rsquonumbersrsquooption the values are displayed as well By default the complete matrix is shown Settingupper=FALSE and diag=FALSE shows a cleaner figure

                  29

                  gt png(circplotpng)gt circ lt- simcirc(24)

                  gt rcirc lt- cor(circ)

                  gt corPlot(rcircmain=24 variables in a circumplex)gt devoff()

                  null device

                  1

                  Figure 12 Using the corPlot function to show the correlations in a circumplex Correlationsare highest near the diagonal diminish to zero further from the diagonal and the increaseagain towards the corners of the matrix Circumplex structures are common in the studyof affect For circumplex structures it is perhaps useful to show the complete matrix

                  30

                  gt png(spiderpng)gt oplt- par(mfrow=c(22))

                  gt spider(y=c(161218)x=124data=rcircfill=TRUEmain=Spider plot of 24 circumplex variables)

                  gt op lt- par(mfrow=c(11))

                  gt devoff()

                  null device

                  1

                  Figure 13 A spider plot can show circumplex structure very clearly Circumplex structuresare common in the study of affect

                  31

                  Table 1 The corrtest function reports correlations cell sizes and raw and adjustedprobability values corrp reports the probability values for a correlation matrix Bydefault the adjustment used is that of Holm (1979)gt corrtest(satact)

                  Callcorrtest(x = satact)

                  Correlation matrix

                  gender education age ACT SATV SATQ

                  gender 100 009 -002 -004 -002 -017

                  education 009 100 055 015 005 003

                  age -002 055 100 011 -004 -003

                  ACT -004 015 011 100 056 059

                  SATV -002 005 -004 056 100 064

                  SATQ -017 003 -003 059 064 100

                  Sample Size

                  gender education age ACT SATV SATQ

                  gender 700 700 700 700 700 687

                  education 700 700 700 700 700 687

                  age 700 700 700 700 700 687

                  ACT 700 700 700 700 700 687

                  SATV 700 700 700 700 700 687

                  SATQ 687 687 687 687 687 687

                  Probability values (Entries above the diagonal are adjusted for multiple tests)

                  gender education age ACT SATV SATQ

                  gender 000 017 100 100 1 0

                  education 002 000 000 000 1 1

                  age 058 000 000 003 1 1

                  ACT 033 000 000 000 0 0

                  SATV 062 022 026 000 0 0

                  SATQ 000 036 037 000 0 0

                  To see confidence intervals of the correlations print with the short=FALSE option

                  32

                  depending upon the input

                  1) For a sample size n find the t and p value for a single correlation as well as the confidenceinterval

                  gt rtest(503)

                  Correlation tests

                  Callrtest(n = 50 r12 = 03)

                  Test of significance of a correlation

                  t value 218 with probability lt 0034

                  and confidence interval 002 053

                  2) For sample sizes of n and n2 (n2 = n if not specified) find the z of the difference betweenthe z transformed correlations divided by the standard error of the difference of two zscores

                  gt rtest(3046)

                  Correlation tests

                  Callrtest(n = 30 r12 = 04 r34 = 06)

                  Test of difference between two independent correlations

                  z value 099 with probability 032

                  3) For sample size n and correlations ra= r12 rb= r23 and r13 specified test for thedifference of two dependent correlations (Steiger case A)

                  gt rtest(103451)

                  Correlation tests

                  Call[1] rtest(n = 103 r12 = 04 r23 = 01 r13 = 05 )

                  Test of difference between two correlated correlations

                  t value -089 with probability lt 037

                  4) For sample size n test for the difference between two dependent correlations involvingdifferent variables (Steiger case B)

                  gt rtest(103567558) steiger Case B

                  Correlation tests

                  Callrtest(n = 103 r12 = 05 r34 = 06 r23 = 07 r13 = 05 r14 = 05

                  r24 = 08)

                  Test of difference between two dependent correlations

                  z value -12 with probability 023

                  To test whether a matrix of correlations differs from what would be expected if the popu-lation correlations were all zero the function cortest follows Steiger (1980) who pointedout that the sum of the squared elements of a correlation matrix or the Fisher z scoreequivalents is distributed as chi square under the null hypothesis that the values are zero(ie elements of the identity matrix) This is particularly useful for examining whethercorrelations in a single matrix differ from zero or for comparing two matrices Althoughobvious cortest can be used to test whether the satact data matrix produces non-zerocorrelations (it does) This is a much more appropriate test when testing whether a residualmatrix differs from zero

                  gt cortest(satact)

                  33

                  Tests of correlation matrices

                  Callcortest(R1 = satact)

                  Chi Square value 132542 with df = 15 with probability lt 18e-273

                  36 Polychoric tetrachoric polyserial and biserial correlations

                  The Pearson correlation of dichotomous data is also known as the φ coefficient If thedata eg ability items are thought to represent an underlying continuous although latentvariable the φ will underestimate the value of the Pearson applied to these latent variablesOne solution to this problem is to use the tetrachoric correlation which is based uponthe assumption of a bivariate normal distribution that has been cut at certain points Thedrawtetra function demonstrates the process (Figure 14) This is also shown in termsof dichotomizing the bivariate normal density function using the drawcor function (Fig-ure 15) A simple generalization of this to the case of the multiple cuts is the polychoric

                  correlation

                  Other estimated correlations based upon the assumption of bivariate normality with cutpoints include the biserial and polyserial correlation

                  If the data are a mix of continuous polytomous and dichotomous variables the mixedcor

                  function will calculate the appropriate mixture of Pearson polychoric tetrachoric biserialand polyserial correlations

                  The correlation matrix resulting from a number of tetrachoric or polychoric correlationmatrix sometimes will not be positive semi-definite This will sometimes happen if thecorrelation matrix is formed by using pair-wise deletion of cases The corsmooth functionwill adjust the smallest eigen values of the correlation matrix to make them positive rescaleall of them to sum to the number of variables and produce aldquosmoothedrdquocorrelation matrixAn example of this problem is a data set of burt which probably had a typo in the originalcorrelation matrix Smoothing the matrix corrects this problem

                  4 Multilevel modeling

                  Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular use

                  34

                  gt drawtetra()

                  minus3 minus2 minus1 0 1 2 3

                  minus3

                  minus2

                  minus1

                  01

                  23

                  Y rho = 05phi = 033

                  X gt τY gt Τ

                  X lt τY gt Τ

                  X gt τY lt Τ

                  X lt τY lt Τ

                  x

                  dnor

                  m(x

                  )

                  X gt τ

                  τ

                  x1

                  Y gt Τ

                  Τ

                  Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values

                  35

                  gt drawcor(expand=20cuts=c(00))

                  xy

                  z

                  Bivariate density rho = 05

                  Figure 15 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values It isfound (laboriously) by optimizing the fit of the bivariate normal for various values of thecorrelation to the observed cell frequencies

                  36

                  is the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups

                  41 Decomposing data into within and between level correlations usingstatsBy

                  There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler functionto give some of the basic descriptive statistics for two level models

                  This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package

                  rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)

                  where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means

                  42 Generating and displaying multilevel data

                  withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a betweengroup correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data

                  simmultilevel will generate simulated data with a multilevel structure

                  The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary

                  function specifying the variable of interest

                  37

                  Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )

                  43 Factor analysis by groups

                  Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output

                  sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)

                  faBy(sbnfactors=5) find the 5 factor solution for each education level

                  5 Multiple Regression mediation moderation and set cor-relations

                  The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices

                  51 Multiple regression from data or correlation matrices

                  The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstonecorrelation matrix and find the multiple correlation of the last five variables as a functionof the first 4

                  gt setCor(y = 59x=14data=Thurstone)

                  Call setCor(y = 59 x = 14 data = Thurstone)

                  Multiple Regression from matrix input

                  Beta weights

                  FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

                  Sentences 009 007 025 021 020

                  Vocabulary 009 017 009 016 -002

                  SentCompletion 002 005 004 021 008

                  FirstLetters 058 045 021 008 031

                  38

                  Multiple R

                  FourLetterWords Suffixes LetterSeries Pedigrees

                  069 063 050 058

                  LetterGroup

                  048

                  multiple R2

                  FourLetterWords Suffixes LetterSeries Pedigrees

                  048 040 025 034

                  LetterGroup

                  023

                  Multiple Inflation Factor (VIF) = 1(1-SMC) =

                  Sentences Vocabulary SentCompletion FirstLetters

                  369 388 300 135

                  Unweighted multiple R

                  FourLetterWords Suffixes LetterSeries Pedigrees

                  059 058 049 058

                  LetterGroup

                  045

                  Unweighted multiple R2

                  FourLetterWords Suffixes LetterSeries Pedigrees

                  034 034 024 033

                  LetterGroup

                  020

                  Various estimates of between set correlations

                  Squared Canonical Correlations

                  [1] 06280 01478 00076 00049

                  Average squared canonical correlation = 02

                  Cohens Set Correlation R2 = 069

                  Unweighted correlation between the two sets = 073

                  By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed

                  gt sc lt- setCor(y = 59x=34data=Thurstonez=12)

                  Call setCor(y = 59 x = 34 data = Thurstone z = 12)

                  Multiple Regression from matrix input

                  Beta weights

                  FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

                  SentCompletion 002 005 004 021 008

                  FirstLetters 058 045 021 008 031

                  Multiple R

                  FourLetterWords Suffixes LetterSeries Pedigrees

                  058 046 021 018

                  LetterGroup

                  030

                  39

                  multiple R2

                  FourLetterWords Suffixes LetterSeries Pedigrees

                  0331 0210 0043 0032

                  LetterGroup

                  0092

                  Multiple Inflation Factor (VIF) = 1(1-SMC) =

                  SentCompletion FirstLetters

                  102 102

                  Unweighted multiple R

                  FourLetterWords Suffixes LetterSeries Pedigrees

                  044 035 017 014

                  LetterGroup

                  026

                  Unweighted multiple R2

                  FourLetterWords Suffixes LetterSeries Pedigrees

                  019 012 003 002

                  LetterGroup

                  007

                  Various estimates of between set correlations

                  Squared Canonical Correlations

                  [1] 0405 0023

                  Average squared canonical correlation = 021

                  Cohens Set Correlation R2 = 042

                  Unweighted correlation between the two sets = 048

                  gt round(sc$residual2)

                  FourLetterWords Suffixes LetterSeries Pedigrees

                  FourLetterWords 052 011 009 006

                  Suffixes 011 060 -001 001

                  LetterSeries 009 -001 075 028

                  Pedigrees 006 001 028 066

                  LetterGroup 013 003 037 020

                  LetterGroup

                  FourLetterWords 013

                  Suffixes 003

                  LetterSeries 037

                  Pedigrees 020

                  LetterGroup 077

                  52 Mediation and Moderation analysis

                  Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping

                  40

                  Consider the example from Preacher and Hayes (2004) as analyzed using the mediate

                  function and the subsequent graphic from mediatediagram The data are found in theexample for mediate

                  Call mediate(y = SATIS x = THERAPY m = ATTRIB data = sobel)

                  The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB

                  Total Direct effect(c) of THERAPY on SATIS = 076 SE = 031 t direct = 25 with probability = 0019

                  Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t direct = 135 with probability = 019

                  Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033

                  Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 004 Upper CI = 069

                  R2 of model = 031

                  To see the longer output specify short = FALSE in the print statement

                  Full output

                  Total effect estimates (c)

                  SATIS se t Prob

                  THERAPY 076 031 25 00186

                  Direct effect estimates (c)SATIS se t Prob

                  THERAPY 043 032 135 0190

                  ATTRIB 040 018 223 0034

                  a effect estimates

                  THERAPY se t Prob

                  ATTRIB 082 03 274 00106

                  b effect estimates

                  SATIS se t Prob

                  ATTRIB 04 018 223 0034

                  ab effect estimates

                  SATIS boot sd lower upper

                  THERAPY 033 032 017 004 069

                  bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables

                  setCor(y = c( SATV SATQ) x = c(education age ) data = satact std=TRUE)

                  bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap

                  mediate(y = c( SATV ) x = c(education age ) m= ACT data =satactstd=TRUEniter=50)

                  bull mediate will take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It then tests the mediation effect using a bootstrap The particular example is for demonstration purposes only and shows neithermoderation nor mediation The number of iterations for the boot strap was set to 50

                  41

                  gt mediatediagram(preacher)

                  Mediation model

                  THERAPY SATIS

                  ATTRIB

                  082

                  c = 076

                  c = 043

                  04

                  Figure 16 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram

                  42

                  gt preacher lt- setCor(1c(23)sobelstd=FALSE)

                  gt setCordiagram(preacher)

                  Regression Models

                  THERAPY

                  ATTRIB

                  SATIS

                  043

                  04

                  021

                  Figure 17 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure

                  43

                  for speed The default number of boot straps is 5000

                  53 Set Correlation

                  An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor

                  function Set correlation is

                  R2 = 1minusn

                  prodi=1

                  (1minusλi)

                  where λi is the ith eigen value of the eigen value decomposition of the matrix

                  R = Rminus1xx RxyRminus1

                  xx Rminus1xy

                  Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate

                  setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data

                  Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things tonotice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied

                  For this example the analysis is done on the correlation matrix rather than the rawdata

                  gt C lt- cov(satactuse=pairwise)

                  gt model1 lt- lm(ACT~ gender + education + age data=satact)

                  gt summary(model1)

                  Call

                  lm(formula = ACT ~ gender + education + age data = satact)

                  Residuals

                  44

                  Call mediate(y = c(SATQ) x = c(ACT) m = education data = satact

                  mod = gender niter = 50 std = TRUE)

                  The DV (Y) was SATQ The IV (X) was ACT gender ACTXgndr The mediating variable(s) = education

                  Total Direct effect(c) of ACT on SATQ = 058 SE = 003 t direct = 1925 with probability = 0

                  Direct effect (c) of ACT on SATQ removing education = 059 SE = 003 t direct = 1926 with probability = 0

                  Indirect effect (ab) of ACT on SATQ through education = -001

                  Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -002 Upper CI = 0

                  Total Direct effect(c) of gender on SATQ = -014 SE = 003 t direct = -478 with probability = 21e-06

                  Direct effect (c) of gender on NA removing education = -014 SE = 003 t direct = -463 with probability = 44e-06

                  Indirect effect (ab) of gender on SATQ through education = 0

                  Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -001 Upper CI = 0

                  Total Direct effect(c) of ACTXgndr on SATQ = 0 SE = 003 t direct = 002 with probability = 099

                  Direct effect (c) of ACTXgndr on NA removing education = 0 SE = 003 t direct = 001 with probability = 099

                  Indirect effect (ab) of ACTXgndr on SATQ through education = 0

                  Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = 0 Upper CI = 0

                  R2 of model = 037

                  To see the longer output specify short = FALSE in the print statement

                  Full output

                  Total effect estimates (c)

                  SATQ se t Prob

                  ACT 058 003 1925 000e+00

                  gender -014 003 -478 210e-06

                  ACTXgndr 000 003 002 985e-01

                  Direct effect estimates (c)SATQ se t Prob

                  ACT 059 003 1926 000e+00

                  gender -014 003 -463 437e-06

                  ACTXgndr 000 003 001 992e-01

                  a effect estimates

                  education se t Prob

                  ACT 016 004 422 277e-05

                  gender 009 004 250 128e-02

                  ACTXgndr -001 004 -015 883e-01

                  b effect estimates

                  SATQ se t Prob

                  education -004 003 -145 0147

                  ab effect estimates

                  SATQ boot sd lower upper

                  ACT -001 -001 001 0 0

                  gender 000 000 000 0 0

                  ACTXgndr 000 000 000 0 0

                  Moderation model

                  ACT

                  gender

                  ACTXgndr

                  SATQ

                  education016 c = 058

                  c = 059

                  009 c = minus014

                  c = minus014

                  minus001 c = 0

                  c = 0

                  minus004

                  minus004

                  minus007

                  002

                  Figure 18 Moderated multiple regression requires the raw data

                  45

                  Min 1Q Median 3Q Max

                  -252458 -32133 07769 35921 92630

                  Coefficients

                  Estimate Std Error t value Pr(gt|t|)

                  (Intercept) 2741706 082140 33378 lt 2e-16

                  gender -048606 037984 -1280 020110

                  education 047890 015235 3143 000174

                  age 001623 002278 0712 047650

                  ---

                  Signif codes 0 0001 001 005 01 1

                  Residual standard error 4768 on 696 degrees of freedom

                  Multiple R-squared 00272 Adjusted R-squared 002301

                  F-statistic 6487 on 3 and 696 DF p-value 00002476

                  Compare this with the output from setCor

                  gt compare with sector

                  gt setCor(c(46)c(13)C nobs=700)

                  Call setCor(y = c(46) x = c(13) data = C nobs = 700)

                  Multiple Regression from matrix input

                  Beta weights

                  ACT SATV SATQ

                  gender -005 -003 -018

                  education 014 010 010

                  age 003 -010 -009

                  Multiple R

                  ACT SATV SATQ

                  016 010 019

                  multiple R2

                  ACT SATV SATQ

                  00272 00096 00359

                  Multiple Inflation Factor (VIF) = 1(1-SMC) =

                  gender education age

                  101 145 144

                  Unweighted multiple R

                  ACT SATV SATQ

                  015 005 011

                  Unweighted multiple R2

                  ACT SATV SATQ

                  002 000 001

                  SE of Beta weights

                  ACT SATV SATQ

                  gender 018 429 434

                  education 022 513 518

                  age 022 511 516

                  t of Beta Weights

                  ACT SATV SATQ

                  gender -027 -001 -004

                  education 065 002 002

                  46

                  age 015 -002 -002

                  Probability of t lt

                  ACT SATV SATQ

                  gender 079 099 097

                  education 051 098 098

                  age 088 098 099

                  Shrunken R2

                  ACT SATV SATQ

                  00230 00054 00317

                  Standard Error of R2

                  ACT SATV SATQ

                  00120 00073 00137

                  F

                  ACT SATV SATQ

                  649 226 863

                  Probability of F lt

                  ACT SATV SATQ

                  248e-04 808e-02 124e-05

                  degrees of freedom of regression

                  [1] 3 696

                  Various estimates of between set correlations

                  Squared Canonical Correlations

                  [1] 0050 0033 0008

                  Chisq of canonical correlations

                  [1] 358 231 56

                  Average squared canonical correlation = 003

                  Cohens Set Correlation R2 = 009

                  Shrunken Set Correlation R2 = 008

                  F and df of Cohens Set Correlation 726 9 168186

                  Unweighted correlation between the two sets = 001

                  Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

                  6 Converting output to APA style tables using LATEX

                  Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenient

                  47

                  LATEXoutput and finally df2latex converts a generic data frame to LATEX

                  An example of converting the output from fa to LATEXappears in Table 2

                  Table 2 fa2latexA factor analysis table from the psych package in R

                  Variable MR1 MR2 MR3 h2 u2 com

                  Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

                  SS loadings 264 186 15

                  MR1 100 059 054MR2 059 100 052MR3 054 052 100

                  48

                  7 Miscellaneous functions

                  A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

                  blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

                  df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

                  scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

                  cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

                  cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

                  dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

                  fisherz Convert a correlation to the corresponding Fisher z score

                  geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

                  ICC and cohenkappa are typically used to find the reliability for raters

                  headtail combines the head and tail functions to show the first and last lines of a dataset or output

                  topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

                  mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

                  prep finds the probability of replication for an F t or r and estimate effect size

                  partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

                  rangeCorrection will correct correlations for restriction of range

                  reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

                  49

                  superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

                  8 Data sets

                  A number of data sets for demonstrating psychometric techniques are included in thepsych package These include six data sets showing a hierarchical factor structure (fivecognitive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2and one from health psychology Reise) One of these (Thurstone) is used as an examplein the sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) or 13 personality inventory scores (epibfi) and14 multiple choice iq items (iqitems) The vegetables example has paired comparisonpreferences for 9 vegetables This is an example of Thurstonian scaling used by Guilford(1954) and Nunnally (1967) Other data sets include cubits peas and heights fromGalton

                  Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

                  bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

                  satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

                  epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

                  50

                  iq 14 multiple choice ability items were included as part of the Synthetic Aperture Person-ality Assessment (SAPA) web based personality assessment project The data from1000 subjects are included here as a demonstration set for scoring multiple choiceinventories and doing basic item statistics

                  galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to ntroduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

                  Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

                  miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

                  9 Development version and a users guide

                  The most recent development version is available as a source file at the repository main-tained at httppersonality-projectorgr That version will have removed the mostrecently discovered bugs (but perhaps introduced other yet to be discovered ones) Todownload that version go to the repository httppersonality-projectorgrsrc

                  contrib and wander around For a Mac this version can be installed directly using theldquoother repositoryrdquo option in the package installer For a PC the zip file for the most recentrelease has been created using the win-builder facility at CRAN The development releasefor the Mac is usually several weeks ahead of the PC development version

                  Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httppersonality-projectorgrpsych_manualpdf

                  News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

                  gt news(Version gt 170package=psych)

                  51

                  10 Psychometric Theory

                  The psych package has been developed to help psychologists do basic research Many ofthe functions were developed to supplement a book (httppersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep)More information about the use of some of the functions may be found in the book

                  For more extensive discussion of the use of psych in particular and R in general consulthttppersonality-projectorgrrguidehtml A short guide to R

                  11 SessionInfo

                  This document was prepared using the following settings

                  gt sessionInfo()

                  R Under development (unstable) (2017-03-05 r72309)

                  Platform x86_64-apple-darwin1340 (64-bit)

                  Running under macOS Sierra 10124

                  Matrix products default

                  BLAS LibraryFrameworksRframeworkVersions34ResourcesliblibRblas0dylib

                  LAPACK LibraryFrameworksRframeworkVersions34ResourcesliblibRlapackdylib

                  locale

                  [1] C

                  attached base packages

                  [1] stats graphics grDevices utils datasets methods base

                  other attached packages

                  [1] psych_17421

                  loaded via a namespace (and not attached)

                  [1] compiler_340 parallel_340 tools_340 foreign_08-67

                  [5] KernSmooth_223-15 nlme_31-131 mnormt_15-4 grid_340

                  [9] lattice_020-34

                  52

                  References

                  Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

                  Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

                  Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

                  Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

                  Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

                  Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

                  Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

                  Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

                  Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

                  Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

                  Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

                  Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

                  Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

                  Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

                  Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

                  53

                  Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

                  Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

                  Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

                  Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

                  Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

                  Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

                  Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

                  Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

                  Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

                  Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

                  MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

                  Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

                  McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

                  Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

                  Nunnally J C (1967) Psychometric theory McGraw-Hill New York

                  54

                  Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

                  3rd edition

                  Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

                  Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

                  Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

                  Revelle W (2015) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 158

                  Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

                  Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

                  Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

                  Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

                  Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook ofIndividual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

                  Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

                  Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

                  Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

                  Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bang

                  55

                  for the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

                  Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

                  Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

                  Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

                  Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

                  Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

                  Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

                  Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

                  Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

                  Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

                  Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

                  Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

                  56

                  Index

                  affect 14 24alpha 5 6

                  Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26bifactor 6biserial 13 34blockrandom 49burt 34

                  char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49circular statistics 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49ctv 7cubits 50

                  densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49diagram 8drawcor 34drawtetra 34dummycode 13

                  dynamite plot 19

                  edit 3epibfi 50error bars 19errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23

                  fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factor analysis 6factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49

                  galton 51generalized least squares 6geometricmean 49GPArotation 7guttman 6

                  harmonicmean 49head 49headtail 49heights 50hetdiagram 7Hmisc 28Holzinger 50

                  57

                  ICC 6 49iclust 6iclustdiagram 7Index 49introduction to psychometric theory with ap-

                  plications in R 7iqitems 50irtfa 6 47irt2latex 47 49

                  KnitR 47

                  lavaan 38library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27lowess 14

                  makekeys 14MAP 6mardia 49maximum likelihood 6mediate 4 41 42mediatediagram 41minimum residual 6mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevel 37multilevelreliability 6multiple regression 38

                  nfactors 6nlme 37

                  omega 6 7outlier 3 11 12

                  padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7principal axis 6psych 3 5ndash8 28 47 49ndash52

                  R functionaffect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50

                  58

                  densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13edit 3epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49head 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47

                  irt2latex 47 49library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7psych 51psych package

                  affect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26

                  59

                  biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49corsmooth 34cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7

                  fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47irt2latex 47 49lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12prep 49pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34

                  60

                  polyserial 34principal 5ndash7psych 51rtest 28rangeCorrection 49readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

                  rtest 28

                  rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50table 19tail 49tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

                  R package

                  61

                  ctv 7GPArotation 7Hmisc 28KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash8 28 47 49ndash52Rgraphviz 8sem 7 50stats 28Sweave 47xtable 47

                  rtest 28rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49Rgraphviz 8

                  SAPA 26 50 51satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43sem 7 50set correlation 44setcor 49setCor 4 38 41 44 46 47simmultilevel 37

                  spider 13stars 14stats 28StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50Sweave 47

                  table 19tail 49tetrachoric 6 34Thurstone 28 38 50Thurstone33 50topBottom 49

                  vegetables 50 51violinBy 14 18vss 5 6

                  weighted least squares 6withinBetween 37

                  xtable 47

                  62

                  • Jump starting the psych packagendasha guide for the impatient
                  • Psychometric functions are summarized in the second vignette
                  • Overview of this and related documents
                  • Getting started
                  • Basic data analysis
                    • Getting the data by using readfile
                    • Data input from the clipboard
                    • Basic descriptive statistics
                      • Outlier detection using outlier
                      • Basic data cleaning using scrub
                      • Recoding categorical variables into dummy coded variables
                        • Simple descriptive graphics
                          • Scatter Plot Matrices
                          • Density or violin plots
                          • Means and error bars
                          • Error bars for tabular data
                          • Two dimensional displays of means and errors
                          • Back to back histograms
                          • Correlational structure
                          • Heatmap displays of correlational structure
                            • Testing correlations
                            • Polychoric tetrachoric polyserial and biserial correlations
                              • Multilevel modeling
                                • Decomposing data into within and between level correlations using statsBy
                                • Generating and displaying multilevel data
                                • Factor analysis by groups
                                  • Multiple Regression mediation moderation and set correlations
                                    • Multiple regression from data or correlation matrices
                                    • Mediation and Moderation analysis
                                    • Set Correlation
                                      • Converting output to APA style tables using LaTeX
                                      • Miscellaneous functions
                                      • Data sets
                                      • Development version and a users guide
                                      • Psychometric Theory
                                      • SessionInfo

                    first variable is 5 columns the second is 2 columns the next 5 are 1 column the last 4 are3 columns)

                    gt mydata lt- readclipboardfwf(widths=c(52rep(15)rep(34))

                    33 Basic descriptive statistics

                    Once the data are read in then describe or describeBy will provide basic descriptivestatistics arranged in a data frame format Consider the data set satact which in-cludes data from 700 web based participants on 3 demographic variables and 3 abilitymeasures

                    describe reports means standard deviations medians min max range skew kurtosisand standard errors for integer or real data Non-numeric data although the statisticsare meaningless will be treated as if numeric (based upon the categorical coding ofthe data) and will be flagged with an

                    describeBy reports descriptive statistics broken down by some categorizing variable (eggender age etc)

                    gt library(psych)

                    gt data(satact)

                    gt describe(satact) basic descriptive statistics

                    vars n mean sd median trimmed mad min max range skew

                    gender 1 700 165 048 2 168 000 1 2 1 -061

                    education 2 700 316 143 3 331 148 0 5 5 -068

                    age 3 700 2559 950 22 2386 593 13 65 52 164

                    ACT 4 700 2855 482 29 2884 445 3 36 33 -066

                    SATV 5 700 61223 11290 620 61945 11861 200 800 600 -064

                    SATQ 6 687 61022 11564 620 61725 11861 200 800 600 -059

                    kurtosis se

                    gender -162 002

                    education -007 005

                    age 242 036

                    ACT 053 018

                    SATV 033 427

                    SATQ -002 441

                    These data may then be analyzed by groups defined in a logical statement or by some othervariable Eg break down the descriptive data for males or females These descriptivedata can also be seen graphically using the errorbarsby function (Figure 6) By settingskew=FALSE and ranges=FALSE the output is limited to the most basic statistics

                    gt basic descriptive statistics by a grouping variable

                    gt describeBy(satactsatact$genderskew=FALSEranges=FALSE)

                    Descriptive statistics by group

                    group 1

                    vars n mean sd se

                    gender 1 247 100 000 000

                    10

                    education 2 247 300 154 010

                    age 3 247 2586 974 062

                    ACT 4 247 2879 506 032

                    SATV 5 247 61511 11416 726

                    SATQ 6 245 63587 11602 741

                    ------------------------------------------------------------

                    group 2

                    vars n mean sd se

                    gender 1 453 200 000 000

                    education 2 453 326 135 006

                    age 3 453 2545 937 044

                    ACT 4 453 2842 469 022

                    SATV 5 453 61066 11231 528

                    SATQ 6 442 59600 11307 538

                    The output from the describeBy function can be forced into a matrix form for easy analysisby other programs In addition describeBy can group by several grouping variables at thesame time

                    gt samat lt- describeBy(satactlist(satact$gendersatact$education)

                    + skew=FALSEranges=FALSEmat=TRUE)

                    gt headTail(samat)

                    item group1 group2 vars n mean sd se

                    gender1 1 1 0 1 27 1 0 0

                    gender2 2 2 0 1 30 2 0 0

                    gender3 3 1 1 1 20 1 0 0

                    gender4 4 2 1 1 25 2 0 0

                    ltNAgt ltNAgt ltNAgt

                    SATQ9 69 1 4 6 51 6359 10412 1458

                    SATQ10 70 2 4 6 86 59759 10624 1146

                    SATQ11 71 1 5 6 46 65783 8961 1321

                    SATQ12 72 2 5 6 93 60672 10555 1095

                    331 Outlier detection using outlier

                    One way to detect unusual data is to consider how far each data point is from the mul-tivariate centroid of the data That is find the squared Mahalanobis distance for eachdata point and then compare these to the expected values of χ2 This produces a Q-Q(quantle-quantile) plot with the n most extreme data points labeled (Figure 1) The outliervalues are in the vector d2

                    332 Basic data cleaning using scrub

                    If after describing the data it is apparent that there were data entry errors that need tobe globally replaced with NA or only certain ranges of data will be analyzed the data canbe ldquocleanedrdquo using the scrub function

                    Consider a data set of 10 rows of 12 columns with values from 1 - 120 All values of columns

                    11

                    gt png( outlierpng )

                    gt d2 lt- outlier(satactcex=8)

                    gt devoff()

                    null device

                    1

                    Figure 1 Using the outlier function to graphically show outliers The y axis is theMahalanobis D2 the X axis is the distribution of χ2 for the same number of degrees offreedom The outliers detected here may be shown graphically using pairspanels (see2 and may be found by sorting d2

                    12

                    3 - 5 that are less than 30 40 or 50 respectively or greater than 70 in any of the threecolumns will be replaced with NA In addition any value exactly equal to 45 will be setto NA (max and isvalue are set to one value here but they could be a different value forevery column)

                    gt x lt- matrix(1120ncol=10byrow=TRUE)

                    gt colnames(x) lt- paste(V110sep=)gt newx lt- scrub(x35min=c(304050)max=70isvalue=45newvalue=NA)

                    gt newx

                    V1 V2 V3 V4 V5 V6 V7 V8 V9 V10

                    [1] 1 2 NA NA NA 6 7 8 9 10

                    [2] 11 12 NA NA NA 16 17 18 19 20

                    [3] 21 22 NA NA NA 26 27 28 29 30

                    [4] 31 32 33 NA NA 36 37 38 39 40

                    [5] 41 42 43 44 NA 46 47 48 49 50

                    [6] 51 52 53 54 55 56 57 58 59 60

                    [7] 61 62 63 64 65 66 67 68 69 70

                    [8] 71 72 NA NA NA 76 77 78 79 80

                    [9] 81 82 NA NA NA 86 87 88 89 90

                    [10] 91 92 NA NA NA 96 97 98 99 100

                    [11] 101 102 NA NA NA 106 107 108 109 110

                    [12] 111 112 NA NA NA 116 117 118 119 120

                    Note that the number of subjects for those columns has decreased and the minimums havegone up but the maximums down Data cleaning and examination for outliers should be aroutine part of any data analysis

                    333 Recoding categorical variables into dummy coded variables

                    Sometimes categorical variables (eg college major occupation ethnicity) are to be ana-lyzed using correlation or regression To do this one can form ldquodummy codesrdquo which aremerely binary variables for each category This may be done using dummycode Subse-quent analyses using these dummy coded variables may be using biserial or point biserial(regular Pearson r) to show effect sizes and may be plotted in eg spider plots

                    Alternatively sometimes data were coded originally as categorical (MaleFemale HighSchool some College in college etc) and you want to convert these columns of data tonumeric This is done by char2numeric

                    34 Simple descriptive graphics

                    Graphic descriptions of data are very helpful both for understanding the data as well ascommunicating important results Scatter Plot Matrices (SPLOMS) using the pairspanelsfunction are useful ways to look for strange effects involving outliers and non-linearitieserrorbarsby will show group means with 95 confidence boundaries By default er-rorbarsby and errorbars will show ldquocats eyesrdquo to graphically show the confidence

                    13

                    limits (Figure 6) This may be turned off by specifying eyes=FALSE densityBy or vio-

                    linBy may be used to show the distribution of the data in ldquoviolinrdquo plots (Figure 5) (Theseare sometimes called ldquolava-lamprdquo plots)

                    341 Scatter Plot Matrices

                    Scatter Plot Matrices (SPLOMS) are very useful for describing the data The pairspanelsfunction adapted from the help menu for the pairs function produces xy scatter plots ofeach pair of variables below the diagonal shows the histogram of each variable on thediagonal and shows the lowess locally fit regression line as well An ellipse around themean with the axis length reflecting one standard deviation of the x and y variables is alsodrawn The x axis in each scatter plot represents the column variable the y axis the rowvariable (Figure 2) When plotting many subjects it is both faster and cleaner to set theplot character (pch) to be rsquorsquo (See Figure 2 for an example)

                    pairspanels will show the pairwise scatter plots of all the variables as well as his-tograms locally smoothed regressions and the Pearson correlation When plottingmany data points (as in the case of the satact data it is possible to specify that theplot character is a period to get a somewhat cleaner graphic However in this figureto show the outliers we use colors and a larger plot character If we want to indicatersquosignificancersquo of the correlations by the conventional use of rsquomagic astricksrsquo we can setthe stars=TRUE option

                    Another example of pairspanels is to show differences between experimental groupsConsider the data in the affect data set The scores reflect post test scores on positiveand negative affect and energetic and tense arousal The colors show the results for fourmovie conditions depressing frightening movie neutral and a comedy

                    Yet another demonstration of pairspanels is useful when you have many subjects andwant to show the density of the distributions To do this we will use the makekeys

                    and scoreItems functions (discussed in the second vignette) to create scales measuringEnergetic Arousal Tense Arousal Positive Affect and Negative Affect (see the msq helpfile) We then show a pairspanels scatter plot matrix where we smooth the data pointsand show the density of the distribution by color

                    342 Density or violin plots

                    Graphical presentation of data may be shown using box plots to show the median and 25thand 75th percentiles A powerful alternative is to show the density distribution using theviolinBy function (Figure 5)

                    14

                    gt png( pairspanelspng )

                    gt satd2 lt- dataframe(satactd2) combine the d2 statistics from before with the satact dataframe

                    gt pairspanels(satd2bg=c(yellowblue)[(d2 gt 25)+1]pch=21stars=TRUE)

                    gt devoff()

                    null device

                    1

                    Figure 2 Using the pairspanels function to graphically show relationships The x axisin each scatter plot represents the column variable the y axis the row variable Note theextreme outlier for the ACT If the plot character were set to a period (pch=rsquorsquo) it wouldmake a cleaner graphic but in to show the outliers in color we use the plot characters 21and 22

                    15

                    gt png(affectpng)gt pairspanels(affect[1417]bg=c(redblackwhiteblue)[affect$Film]pch=21

                    + main=Affect varies by movies )

                    gt devoff()

                    null device

                    1

                    Figure 3 Using the pairspanels function to graphically show relationships The x axis ineach scatter plot represents the column variable the y axis the row variable The coloringrepresent four different movie conditions

                    16

                    gt keys lt- makekeys(msq[175]list(

                    + EA = c(active energetic vigorous wakeful wideawake fullofpep

                    + lively -sleepy -tired -drowsy)

                    + TA =c(intense jittery fearful tense clutchedup -quiet -still

                    + -placid -calm -atrest)

                    + PA =c(active excited strong inspired determined attentive

                    + interested enthusiastic proud alert)

                    + NAf =c(jittery nervous scared afraid guilty ashamed distressed

                    + upset hostile irritable )) )

                    gt scores lt- scoreItems(keysmsq[175])

                    gt png(msqpng)gt pairspanels(scores$scoressmoother=TRUE

                    + main =Density distributions of four measures of affect )

                    gt devoff()

                    null device

                    1

                    Figure 4 Using the pairspanels function to graphically show relationships The x axis ineach scatter plot represents the column variable the y axis the row variable The variablesare four measures of motivational state for 3896 participants Each scale is the averagescore of 10 items measuring motivational state Compare this a plot with smoother set toFALSE

                    17

                    gt data(satact)

                    gt violinBy(satact[56]satact$gendergrpname=c(M F)main=Density Plot by gender for SAT V and Q)

                    Density Plot by gender for SAT V and Q

                    Obs

                    erve

                    d

                    SATV M SATV F SATQ M SATQ F

                    200

                    300

                    400

                    500

                    600

                    700

                    800

                    Figure 5 Using the violinBy function to show the distribution of SAT V and Q for malesand females The plot shows the medians and 25th and 75th percentiles as well as theentire range and the density distribution

                    18

                    343 Means and error bars

                    Additional descriptive graphics include the ability to draw error bars on sets of data aswell as to draw error bars in both the x and y directions for paired data These are thefunctions errorbars errorbarsby errorbarstab and errorcrosses

                    errorbars show the 95 confidence intervals for each variable in a data frame or ma-trix These errors are based upon normal theory and the standard errors of the meanAlternative options include +- one standard deviation or 1 standard error If thedata are repeated measures the error bars will be reflect the between variable cor-relations By default the confidence intervals are displayed using a ldquocats eyesrdquo plotwhich emphasizes the distribution of confidence within the confidence interval

                    errorbarsby does the same but grouping the data by some condition

                    errorbarstab draws bar graphs from tabular data with error bars based upon thestandard error of proportion (σp =

                    radicpqN)

                    errorcrosses draw the confidence intervals for an x set and a y set of the same size

                    The use of the errorbarsby function allows for graphic comparisons of different groups(see Figure 6) Five personality measures are shown as a function of high versus low scoreson a ldquolierdquo scale People with higher lie scores tend to report being more agreeable consci-entious and less neurotic than people with lower lie scores The error bars are based uponnormal theory and thus are symmetric rather than reflect any skewing in the data

                    Although not recommended it is possible to use the errorbars function to draw bargraphs with associated error bars (This kind of dynamite plot (Figure 8) can be verymisleading in that the scale is arbitrary Go to a discussion of the problems in presentingdata this way at httpemdbolkerwikidotcomblogdynamite In the example shownnote that the graph starts at 0 although is out of the range This is a function of usingbars which always are assumed to start at zero Consider other ways of showing yourdata

                    344 Error bars for tabular data

                    However it is sometimes useful to show error bars for tabular data either found by thetable function or just directly input These may be found using the errorbarstab

                    function

                    19

                    gt data(epibfi)

                    gt errorbarsby(epibfi[610]epibfi$epilielt4)

                    095 confidence limits

                    Independent Variable

                    Dep

                    ende

                    nt V

                    aria

                    ble

                    bfagree bfcon bfext bfneur bfopen

                    050

                    100

                    150

                    Figure 6 Using the errorbarsby function shows that self reported personality scales onthe Big Five Inventory vary as a function of the Lie scale on the EPI The ldquocats eyesrdquo showthe distribution of the confidence

                    20

                    gt errorbarsby(satact[56]satact$genderbars=TRUE

                    + labels=c(MaleFemale)ylab=SAT scorexlab=)

                    Male Female

                    095 confidence limits

                    SAT

                    sco

                    re

                    200

                    300

                    400

                    500

                    600

                    700

                    800

                    200

                    300

                    400

                    500

                    600

                    700

                    800

                    Figure 7 A ldquoDynamite plotrdquo of SAT scores as a function of gender is one way of misleadingthe reader By using a bar graph the range of scores is ignored Bar graphs start from 0

                    21

                    gt T lt- with(satacttable(gendereducation))

                    gt rownames(T) lt- c(MF)

                    gt errorbarstab(Tway=bothylab=Proportion of Education Levelxlab=Level of Education

                    + main=Proportion of sample by education level)

                    Proportion of sample by education level

                    Level of Education

                    Pro

                    port

                    ion

                    of E

                    duca

                    tion

                    Leve

                    l

                    000

                    005

                    010

                    015

                    020

                    025

                    030

                    M 0 M 1 M 2 M 3 M 4 M 5

                    000

                    005

                    010

                    015

                    020

                    025

                    030

                    Figure 8 The proportion of each education level that is Male or Female By using theway=rdquobothrdquo option the percentages and errors are based upon the grand total Alterna-tively way=rdquocolumnsrdquo finds column wise percentages way=rdquorowsrdquo finds rowwise percent-ages The data can be converted to percentages (as shown) or by total count (raw=TRUE)The function invisibly returns the probabilities and standard errors See the help menu foran example of entering the data as a dataframe

                    22

                    345 Two dimensional displays of means and errors

                    Yet another way to display data for different conditions is to use the errorCrosses func-tion For instance the effect of various movies on both ldquoEnergetic Arousalrdquo and ldquoTenseArousalrdquo can be seen in one graph and compared to the same movie manipulations onldquoPositive Affectrdquo and ldquoNegative Affectrdquo Note how Energetic Arousal is increased by threeof the movie manipulations but that Positive Affect increases following the Happy movieonly

                    23

                    gt op lt- par(mfrow=c(12))

                    gt data(affect)

                    gt colors lt- c(blackredwhiteblue)

                    gt films lt- c(SadHorrorNeutralHappy)

                    gt affectstats lt- errorCircles(EA2TA2data=affect[-c(120)]group=Filmlabels=films

                    + xlab=Energetic Arousal ylab=Tense Arousalylim=c(1022)xlim=c(820)pch=16

                    + cex=2colors=colors main = Movies effect on arousal)gt errorCircles(PA2NA2data=affectstatslabels=filmsxlab=Positive Affect

                    + ylab=Negative Affect pch=16cex=2colors=colors main =Movies effect on affect)

                    gt op lt- par(mfrow=c(11))

                    8 12 16 20

                    1012

                    1416

                    1820

                    22

                    Movies effect on arousal

                    Energetic Arousal

                    Tens

                    e A

                    rous

                    al

                    SadHorror

                    NeutralHappy

                    6 8 10 12

                    24

                    68

                    10

                    Movies effect on affect

                    Positive Affect

                    Neg

                    ativ

                    e A

                    ffect

                    Sad

                    Horror

                    NeutralHappy

                    Figure 9 The use of the errorCircles function allows for two dimensional displays ofmeans and error bars The first call to errorCircles finds descriptive statistics for theaffect dataframe based upon the grouping variable of Film These data are returned andthen used by the second call which examines the effect of the same grouping variable upondifferent measures The size of the circles represent the relative sample sizes for each groupThe data are from the PMC lab and reported in Smillie et al (2012)

                    24

                    346 Back to back histograms

                    The bibars function summarize the characteristics of two groups (eg males and females)on a second variable (eg age) by drawing back to back histograms (see Figure 10)

                    25

                    data(bfi)gt png( bibarspng )

                    gt with(bfibibars(agegenderylab=Agemain=Age by males and females))

                    gt devoff()

                    null device

                    1

                    Figure 10 A bar plot of the age distribution for males and females shows the use ofbibars The data are males and females from 2800 cases collected using the SAPAprocedure and are available as part of the bfi data set

                    26

                    347 Correlational structure

                    There are many ways to display correlations Tabular displays are probably the mostcommon The output from the cor function in core R is a rectangular matrix lowerMat

                    will round this to (2) digits and then display as a lower off diagonal matrix lowerCor

                    calls cor with use=lsquopairwisersquo method=lsquopearsonrsquo as default values and returns (invisibly)the full correlation matrix and displays the lower off diagonal matrix

                    gt lowerCor(satact)

                    gendr edctn age ACT SATV SATQ

                    gender 100

                    education 009 100

                    age -002 055 100

                    ACT -004 015 011 100

                    SATV -002 005 -004 056 100

                    SATQ -017 003 -003 059 064 100

                    When comparing results from two different groups it is convenient to display them as onematrix with the results from one group below the diagonal and the other group above thediagonal Use lowerUpper to do this

                    gt female lt- subset(satactsatact$gender==2)

                    gt male lt- subset(satactsatact$gender==1)

                    gt lower lt- lowerCor(male[-1])

                    edctn age ACT SATV SATQ

                    education 100

                    age 061 100

                    ACT 016 015 100

                    SATV 002 -006 061 100

                    SATQ 008 004 060 068 100

                    gt upper lt- lowerCor(female[-1])

                    edctn age ACT SATV SATQ

                    education 100

                    age 052 100

                    ACT 016 008 100

                    SATV 007 -003 053 100

                    SATQ 003 -009 058 063 100

                    gt both lt- lowerUpper(lowerupper)

                    gt round(both2)

                    education age ACT SATV SATQ

                    education NA 052 016 007 003

                    age 061 NA 008 -003 -009

                    ACT 016 015 NA 053 058

                    SATV 002 -006 061 NA 063

                    SATQ 008 004 060 068 NA

                    It is also possible to compare two matrices by taking their differences and displaying one (be-low the diagonal) and the difference of the second from the first above the diagonal

                    27

                    gt diffs lt- lowerUpper(lowerupperdiff=TRUE)

                    gt round(diffs2)

                    education age ACT SATV SATQ

                    education NA 009 000 -005 005

                    age 061 NA 007 -003 013

                    ACT 016 015 NA 008 002

                    SATV 002 -006 061 NA 005

                    SATQ 008 004 060 068 NA

                    348 Heatmap displays of correlational structure

                    Perhaps a better way to see the structure in a correlation matrix is to display a heat mapof the correlations This is just a matrix color coded to represent the magnitude of thecorrelation This is useful when considering the number of factors in a data set Considerthe Thurstone data set which has a clear 3 factor solution (Figure 11) or a simulated dataset of 24 variables with a circumplex structure (Figure 12) The color coding representsa ldquoheat maprdquo of the correlations with darker shades of red representing stronger negativeand darker shades of blue stronger positive correlations As an option the value of thecorrelation can be shown

                    Yet another way to show structure is to use ldquospiderrdquo plots Particularly if variables areordered in some meaningful way (eg in a circumplex) a spider plot will show this structureeasily This is just a plot of the magnitude of the correlation as a radial line with lengthranging from 0 (for a correlation of -1) to 1 (for a correlation of 1) (See Figure 13)

                    35 Testing correlations

                    Correlations are wonderful descriptive statistics of the data but some people like to testwhether these correlations differ from zero or differ from each other The cortest func-tion (in the stats package) will test the significance of a single correlation and the rcorr

                    function in the Hmisc package will do this for many correlations In the psych packagethe corrtest function reports the correlation (Pearson Spearman or Kendall) betweenall variables in either one or two data frames or matrices as well as the number of obser-vations for each case and the (two-tailed) probability for each correlation Unfortunatelythese probability values have not been corrected for multiple comparisons and so shouldbe taken with a great deal of salt Thus in corrtest and corrp the raw probabilitiesare reported below the diagonal and the probabilities adjusted for multiple comparisonsusing (by default) the Holm correction are reported above the diagonal (Table 1) (See thepadjust function for a discussion of Holm (1979) and other corrections)

                    Testing the difference between any two correlations can be done using the rtest functionThe function actually does four different tests (based upon an article by Steiger (1980)

                    28

                    gt png(corplotpng)gt corPlot(Thurstonenumbers=TRUEupper=FALSEdiag=FALSEmain=9 cognitive variables from Thurstone)

                    gt devoff()

                    null device

                    1

                    Figure 11 The structure of correlation matrix can be seen more clearly if the variables aregrouped by factor and then the correlations are shown by color By using the rsquonumbersrsquooption the values are displayed as well By default the complete matrix is shown Settingupper=FALSE and diag=FALSE shows a cleaner figure

                    29

                    gt png(circplotpng)gt circ lt- simcirc(24)

                    gt rcirc lt- cor(circ)

                    gt corPlot(rcircmain=24 variables in a circumplex)gt devoff()

                    null device

                    1

                    Figure 12 Using the corPlot function to show the correlations in a circumplex Correlationsare highest near the diagonal diminish to zero further from the diagonal and the increaseagain towards the corners of the matrix Circumplex structures are common in the studyof affect For circumplex structures it is perhaps useful to show the complete matrix

                    30

                    gt png(spiderpng)gt oplt- par(mfrow=c(22))

                    gt spider(y=c(161218)x=124data=rcircfill=TRUEmain=Spider plot of 24 circumplex variables)

                    gt op lt- par(mfrow=c(11))

                    gt devoff()

                    null device

                    1

                    Figure 13 A spider plot can show circumplex structure very clearly Circumplex structuresare common in the study of affect

                    31

                    Table 1 The corrtest function reports correlations cell sizes and raw and adjustedprobability values corrp reports the probability values for a correlation matrix Bydefault the adjustment used is that of Holm (1979)gt corrtest(satact)

                    Callcorrtest(x = satact)

                    Correlation matrix

                    gender education age ACT SATV SATQ

                    gender 100 009 -002 -004 -002 -017

                    education 009 100 055 015 005 003

                    age -002 055 100 011 -004 -003

                    ACT -004 015 011 100 056 059

                    SATV -002 005 -004 056 100 064

                    SATQ -017 003 -003 059 064 100

                    Sample Size

                    gender education age ACT SATV SATQ

                    gender 700 700 700 700 700 687

                    education 700 700 700 700 700 687

                    age 700 700 700 700 700 687

                    ACT 700 700 700 700 700 687

                    SATV 700 700 700 700 700 687

                    SATQ 687 687 687 687 687 687

                    Probability values (Entries above the diagonal are adjusted for multiple tests)

                    gender education age ACT SATV SATQ

                    gender 000 017 100 100 1 0

                    education 002 000 000 000 1 1

                    age 058 000 000 003 1 1

                    ACT 033 000 000 000 0 0

                    SATV 062 022 026 000 0 0

                    SATQ 000 036 037 000 0 0

                    To see confidence intervals of the correlations print with the short=FALSE option

                    32

                    depending upon the input

                    1) For a sample size n find the t and p value for a single correlation as well as the confidenceinterval

                    gt rtest(503)

                    Correlation tests

                    Callrtest(n = 50 r12 = 03)

                    Test of significance of a correlation

                    t value 218 with probability lt 0034

                    and confidence interval 002 053

                    2) For sample sizes of n and n2 (n2 = n if not specified) find the z of the difference betweenthe z transformed correlations divided by the standard error of the difference of two zscores

                    gt rtest(3046)

                    Correlation tests

                    Callrtest(n = 30 r12 = 04 r34 = 06)

                    Test of difference between two independent correlations

                    z value 099 with probability 032

                    3) For sample size n and correlations ra= r12 rb= r23 and r13 specified test for thedifference of two dependent correlations (Steiger case A)

                    gt rtest(103451)

                    Correlation tests

                    Call[1] rtest(n = 103 r12 = 04 r23 = 01 r13 = 05 )

                    Test of difference between two correlated correlations

                    t value -089 with probability lt 037

                    4) For sample size n test for the difference between two dependent correlations involvingdifferent variables (Steiger case B)

                    gt rtest(103567558) steiger Case B

                    Correlation tests

                    Callrtest(n = 103 r12 = 05 r34 = 06 r23 = 07 r13 = 05 r14 = 05

                    r24 = 08)

                    Test of difference between two dependent correlations

                    z value -12 with probability 023

                    To test whether a matrix of correlations differs from what would be expected if the popu-lation correlations were all zero the function cortest follows Steiger (1980) who pointedout that the sum of the squared elements of a correlation matrix or the Fisher z scoreequivalents is distributed as chi square under the null hypothesis that the values are zero(ie elements of the identity matrix) This is particularly useful for examining whethercorrelations in a single matrix differ from zero or for comparing two matrices Althoughobvious cortest can be used to test whether the satact data matrix produces non-zerocorrelations (it does) This is a much more appropriate test when testing whether a residualmatrix differs from zero

                    gt cortest(satact)

                    33

                    Tests of correlation matrices

                    Callcortest(R1 = satact)

                    Chi Square value 132542 with df = 15 with probability lt 18e-273

                    36 Polychoric tetrachoric polyserial and biserial correlations

                    The Pearson correlation of dichotomous data is also known as the φ coefficient If thedata eg ability items are thought to represent an underlying continuous although latentvariable the φ will underestimate the value of the Pearson applied to these latent variablesOne solution to this problem is to use the tetrachoric correlation which is based uponthe assumption of a bivariate normal distribution that has been cut at certain points Thedrawtetra function demonstrates the process (Figure 14) This is also shown in termsof dichotomizing the bivariate normal density function using the drawcor function (Fig-ure 15) A simple generalization of this to the case of the multiple cuts is the polychoric

                    correlation

                    Other estimated correlations based upon the assumption of bivariate normality with cutpoints include the biserial and polyserial correlation

                    If the data are a mix of continuous polytomous and dichotomous variables the mixedcor

                    function will calculate the appropriate mixture of Pearson polychoric tetrachoric biserialand polyserial correlations

                    The correlation matrix resulting from a number of tetrachoric or polychoric correlationmatrix sometimes will not be positive semi-definite This will sometimes happen if thecorrelation matrix is formed by using pair-wise deletion of cases The corsmooth functionwill adjust the smallest eigen values of the correlation matrix to make them positive rescaleall of them to sum to the number of variables and produce aldquosmoothedrdquocorrelation matrixAn example of this problem is a data set of burt which probably had a typo in the originalcorrelation matrix Smoothing the matrix corrects this problem

                    4 Multilevel modeling

                    Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular use

                    34

                    gt drawtetra()

                    minus3 minus2 minus1 0 1 2 3

                    minus3

                    minus2

                    minus1

                    01

                    23

                    Y rho = 05phi = 033

                    X gt τY gt Τ

                    X lt τY gt Τ

                    X gt τY lt Τ

                    X lt τY lt Τ

                    x

                    dnor

                    m(x

                    )

                    X gt τ

                    τ

                    x1

                    Y gt Τ

                    Τ

                    Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values

                    35

                    gt drawcor(expand=20cuts=c(00))

                    xy

                    z

                    Bivariate density rho = 05

                    Figure 15 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values It isfound (laboriously) by optimizing the fit of the bivariate normal for various values of thecorrelation to the observed cell frequencies

                    36

                    is the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups

                    41 Decomposing data into within and between level correlations usingstatsBy

                    There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler functionto give some of the basic descriptive statistics for two level models

                    This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package

                    rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)

                    where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means

                    42 Generating and displaying multilevel data

                    withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a betweengroup correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data

                    simmultilevel will generate simulated data with a multilevel structure

                    The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary

                    function specifying the variable of interest

                    37

                    Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )

                    43 Factor analysis by groups

                    Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output

                    sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)

                    faBy(sbnfactors=5) find the 5 factor solution for each education level

                    5 Multiple Regression mediation moderation and set cor-relations

                    The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices

                    51 Multiple regression from data or correlation matrices

                    The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstonecorrelation matrix and find the multiple correlation of the last five variables as a functionof the first 4

                    gt setCor(y = 59x=14data=Thurstone)

                    Call setCor(y = 59 x = 14 data = Thurstone)

                    Multiple Regression from matrix input

                    Beta weights

                    FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

                    Sentences 009 007 025 021 020

                    Vocabulary 009 017 009 016 -002

                    SentCompletion 002 005 004 021 008

                    FirstLetters 058 045 021 008 031

                    38

                    Multiple R

                    FourLetterWords Suffixes LetterSeries Pedigrees

                    069 063 050 058

                    LetterGroup

                    048

                    multiple R2

                    FourLetterWords Suffixes LetterSeries Pedigrees

                    048 040 025 034

                    LetterGroup

                    023

                    Multiple Inflation Factor (VIF) = 1(1-SMC) =

                    Sentences Vocabulary SentCompletion FirstLetters

                    369 388 300 135

                    Unweighted multiple R

                    FourLetterWords Suffixes LetterSeries Pedigrees

                    059 058 049 058

                    LetterGroup

                    045

                    Unweighted multiple R2

                    FourLetterWords Suffixes LetterSeries Pedigrees

                    034 034 024 033

                    LetterGroup

                    020

                    Various estimates of between set correlations

                    Squared Canonical Correlations

                    [1] 06280 01478 00076 00049

                    Average squared canonical correlation = 02

                    Cohens Set Correlation R2 = 069

                    Unweighted correlation between the two sets = 073

                    By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed

                    gt sc lt- setCor(y = 59x=34data=Thurstonez=12)

                    Call setCor(y = 59 x = 34 data = Thurstone z = 12)

                    Multiple Regression from matrix input

                    Beta weights

                    FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

                    SentCompletion 002 005 004 021 008

                    FirstLetters 058 045 021 008 031

                    Multiple R

                    FourLetterWords Suffixes LetterSeries Pedigrees

                    058 046 021 018

                    LetterGroup

                    030

                    39

                    multiple R2

                    FourLetterWords Suffixes LetterSeries Pedigrees

                    0331 0210 0043 0032

                    LetterGroup

                    0092

                    Multiple Inflation Factor (VIF) = 1(1-SMC) =

                    SentCompletion FirstLetters

                    102 102

                    Unweighted multiple R

                    FourLetterWords Suffixes LetterSeries Pedigrees

                    044 035 017 014

                    LetterGroup

                    026

                    Unweighted multiple R2

                    FourLetterWords Suffixes LetterSeries Pedigrees

                    019 012 003 002

                    LetterGroup

                    007

                    Various estimates of between set correlations

                    Squared Canonical Correlations

                    [1] 0405 0023

                    Average squared canonical correlation = 021

                    Cohens Set Correlation R2 = 042

                    Unweighted correlation between the two sets = 048

                    gt round(sc$residual2)

                    FourLetterWords Suffixes LetterSeries Pedigrees

                    FourLetterWords 052 011 009 006

                    Suffixes 011 060 -001 001

                    LetterSeries 009 -001 075 028

                    Pedigrees 006 001 028 066

                    LetterGroup 013 003 037 020

                    LetterGroup

                    FourLetterWords 013

                    Suffixes 003

                    LetterSeries 037

                    Pedigrees 020

                    LetterGroup 077

                    52 Mediation and Moderation analysis

                    Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping

                    40

                    Consider the example from Preacher and Hayes (2004) as analyzed using the mediate

                    function and the subsequent graphic from mediatediagram The data are found in theexample for mediate

                    Call mediate(y = SATIS x = THERAPY m = ATTRIB data = sobel)

                    The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB

                    Total Direct effect(c) of THERAPY on SATIS = 076 SE = 031 t direct = 25 with probability = 0019

                    Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t direct = 135 with probability = 019

                    Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033

                    Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 004 Upper CI = 069

                    R2 of model = 031

                    To see the longer output specify short = FALSE in the print statement

                    Full output

                    Total effect estimates (c)

                    SATIS se t Prob

                    THERAPY 076 031 25 00186

                    Direct effect estimates (c)SATIS se t Prob

                    THERAPY 043 032 135 0190

                    ATTRIB 040 018 223 0034

                    a effect estimates

                    THERAPY se t Prob

                    ATTRIB 082 03 274 00106

                    b effect estimates

                    SATIS se t Prob

                    ATTRIB 04 018 223 0034

                    ab effect estimates

                    SATIS boot sd lower upper

                    THERAPY 033 032 017 004 069

                    bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables

                    setCor(y = c( SATV SATQ) x = c(education age ) data = satact std=TRUE)

                    bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap

                    mediate(y = c( SATV ) x = c(education age ) m= ACT data =satactstd=TRUEniter=50)

                    bull mediate will take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It then tests the mediation effect using a bootstrap The particular example is for demonstration purposes only and shows neithermoderation nor mediation The number of iterations for the boot strap was set to 50

                    41

                    gt mediatediagram(preacher)

                    Mediation model

                    THERAPY SATIS

                    ATTRIB

                    082

                    c = 076

                    c = 043

                    04

                    Figure 16 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram

                    42

                    gt preacher lt- setCor(1c(23)sobelstd=FALSE)

                    gt setCordiagram(preacher)

                    Regression Models

                    THERAPY

                    ATTRIB

                    SATIS

                    043

                    04

                    021

                    Figure 17 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure

                    43

                    for speed The default number of boot straps is 5000

                    53 Set Correlation

                    An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor

                    function Set correlation is

                    R2 = 1minusn

                    prodi=1

                    (1minusλi)

                    where λi is the ith eigen value of the eigen value decomposition of the matrix

                    R = Rminus1xx RxyRminus1

                    xx Rminus1xy

                    Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate

                    setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data

                    Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things tonotice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied

                    For this example the analysis is done on the correlation matrix rather than the rawdata

                    gt C lt- cov(satactuse=pairwise)

                    gt model1 lt- lm(ACT~ gender + education + age data=satact)

                    gt summary(model1)

                    Call

                    lm(formula = ACT ~ gender + education + age data = satact)

                    Residuals

                    44

                    Call mediate(y = c(SATQ) x = c(ACT) m = education data = satact

                    mod = gender niter = 50 std = TRUE)

                    The DV (Y) was SATQ The IV (X) was ACT gender ACTXgndr The mediating variable(s) = education

                    Total Direct effect(c) of ACT on SATQ = 058 SE = 003 t direct = 1925 with probability = 0

                    Direct effect (c) of ACT on SATQ removing education = 059 SE = 003 t direct = 1926 with probability = 0

                    Indirect effect (ab) of ACT on SATQ through education = -001

                    Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -002 Upper CI = 0

                    Total Direct effect(c) of gender on SATQ = -014 SE = 003 t direct = -478 with probability = 21e-06

                    Direct effect (c) of gender on NA removing education = -014 SE = 003 t direct = -463 with probability = 44e-06

                    Indirect effect (ab) of gender on SATQ through education = 0

                    Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -001 Upper CI = 0

                    Total Direct effect(c) of ACTXgndr on SATQ = 0 SE = 003 t direct = 002 with probability = 099

                    Direct effect (c) of ACTXgndr on NA removing education = 0 SE = 003 t direct = 001 with probability = 099

                    Indirect effect (ab) of ACTXgndr on SATQ through education = 0

                    Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = 0 Upper CI = 0

                    R2 of model = 037

                    To see the longer output specify short = FALSE in the print statement

                    Full output

                    Total effect estimates (c)

                    SATQ se t Prob

                    ACT 058 003 1925 000e+00

                    gender -014 003 -478 210e-06

                    ACTXgndr 000 003 002 985e-01

                    Direct effect estimates (c)SATQ se t Prob

                    ACT 059 003 1926 000e+00

                    gender -014 003 -463 437e-06

                    ACTXgndr 000 003 001 992e-01

                    a effect estimates

                    education se t Prob

                    ACT 016 004 422 277e-05

                    gender 009 004 250 128e-02

                    ACTXgndr -001 004 -015 883e-01

                    b effect estimates

                    SATQ se t Prob

                    education -004 003 -145 0147

                    ab effect estimates

                    SATQ boot sd lower upper

                    ACT -001 -001 001 0 0

                    gender 000 000 000 0 0

                    ACTXgndr 000 000 000 0 0

                    Moderation model

                    ACT

                    gender

                    ACTXgndr

                    SATQ

                    education016 c = 058

                    c = 059

                    009 c = minus014

                    c = minus014

                    minus001 c = 0

                    c = 0

                    minus004

                    minus004

                    minus007

                    002

                    Figure 18 Moderated multiple regression requires the raw data

                    45

                    Min 1Q Median 3Q Max

                    -252458 -32133 07769 35921 92630

                    Coefficients

                    Estimate Std Error t value Pr(gt|t|)

                    (Intercept) 2741706 082140 33378 lt 2e-16

                    gender -048606 037984 -1280 020110

                    education 047890 015235 3143 000174

                    age 001623 002278 0712 047650

                    ---

                    Signif codes 0 0001 001 005 01 1

                    Residual standard error 4768 on 696 degrees of freedom

                    Multiple R-squared 00272 Adjusted R-squared 002301

                    F-statistic 6487 on 3 and 696 DF p-value 00002476

                    Compare this with the output from setCor

                    gt compare with sector

                    gt setCor(c(46)c(13)C nobs=700)

                    Call setCor(y = c(46) x = c(13) data = C nobs = 700)

                    Multiple Regression from matrix input

                    Beta weights

                    ACT SATV SATQ

                    gender -005 -003 -018

                    education 014 010 010

                    age 003 -010 -009

                    Multiple R

                    ACT SATV SATQ

                    016 010 019

                    multiple R2

                    ACT SATV SATQ

                    00272 00096 00359

                    Multiple Inflation Factor (VIF) = 1(1-SMC) =

                    gender education age

                    101 145 144

                    Unweighted multiple R

                    ACT SATV SATQ

                    015 005 011

                    Unweighted multiple R2

                    ACT SATV SATQ

                    002 000 001

                    SE of Beta weights

                    ACT SATV SATQ

                    gender 018 429 434

                    education 022 513 518

                    age 022 511 516

                    t of Beta Weights

                    ACT SATV SATQ

                    gender -027 -001 -004

                    education 065 002 002

                    46

                    age 015 -002 -002

                    Probability of t lt

                    ACT SATV SATQ

                    gender 079 099 097

                    education 051 098 098

                    age 088 098 099

                    Shrunken R2

                    ACT SATV SATQ

                    00230 00054 00317

                    Standard Error of R2

                    ACT SATV SATQ

                    00120 00073 00137

                    F

                    ACT SATV SATQ

                    649 226 863

                    Probability of F lt

                    ACT SATV SATQ

                    248e-04 808e-02 124e-05

                    degrees of freedom of regression

                    [1] 3 696

                    Various estimates of between set correlations

                    Squared Canonical Correlations

                    [1] 0050 0033 0008

                    Chisq of canonical correlations

                    [1] 358 231 56

                    Average squared canonical correlation = 003

                    Cohens Set Correlation R2 = 009

                    Shrunken Set Correlation R2 = 008

                    F and df of Cohens Set Correlation 726 9 168186

                    Unweighted correlation between the two sets = 001

                    Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

                    6 Converting output to APA style tables using LATEX

                    Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenient

                    47

                    LATEXoutput and finally df2latex converts a generic data frame to LATEX

                    An example of converting the output from fa to LATEXappears in Table 2

                    Table 2 fa2latexA factor analysis table from the psych package in R

                    Variable MR1 MR2 MR3 h2 u2 com

                    Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

                    SS loadings 264 186 15

                    MR1 100 059 054MR2 059 100 052MR3 054 052 100

                    48

                    7 Miscellaneous functions

                    A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

                    blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

                    df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

                    scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

                    cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

                    cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

                    dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

                    fisherz Convert a correlation to the corresponding Fisher z score

                    geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

                    ICC and cohenkappa are typically used to find the reliability for raters

                    headtail combines the head and tail functions to show the first and last lines of a dataset or output

                    topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

                    mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

                    prep finds the probability of replication for an F t or r and estimate effect size

                    partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

                    rangeCorrection will correct correlations for restriction of range

                    reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

                    49

                    superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

                    8 Data sets

                    A number of data sets for demonstrating psychometric techniques are included in thepsych package These include six data sets showing a hierarchical factor structure (fivecognitive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2and one from health psychology Reise) One of these (Thurstone) is used as an examplein the sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) or 13 personality inventory scores (epibfi) and14 multiple choice iq items (iqitems) The vegetables example has paired comparisonpreferences for 9 vegetables This is an example of Thurstonian scaling used by Guilford(1954) and Nunnally (1967) Other data sets include cubits peas and heights fromGalton

                    Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

                    bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

                    satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

                    epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

                    50

                    iq 14 multiple choice ability items were included as part of the Synthetic Aperture Person-ality Assessment (SAPA) web based personality assessment project The data from1000 subjects are included here as a demonstration set for scoring multiple choiceinventories and doing basic item statistics

                    galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to ntroduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

                    Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

                    miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

                    9 Development version and a users guide

                    The most recent development version is available as a source file at the repository main-tained at httppersonality-projectorgr That version will have removed the mostrecently discovered bugs (but perhaps introduced other yet to be discovered ones) Todownload that version go to the repository httppersonality-projectorgrsrc

                    contrib and wander around For a Mac this version can be installed directly using theldquoother repositoryrdquo option in the package installer For a PC the zip file for the most recentrelease has been created using the win-builder facility at CRAN The development releasefor the Mac is usually several weeks ahead of the PC development version

                    Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httppersonality-projectorgrpsych_manualpdf

                    News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

                    gt news(Version gt 170package=psych)

                    51

                    10 Psychometric Theory

                    The psych package has been developed to help psychologists do basic research Many ofthe functions were developed to supplement a book (httppersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep)More information about the use of some of the functions may be found in the book

                    For more extensive discussion of the use of psych in particular and R in general consulthttppersonality-projectorgrrguidehtml A short guide to R

                    11 SessionInfo

                    This document was prepared using the following settings

                    gt sessionInfo()

                    R Under development (unstable) (2017-03-05 r72309)

                    Platform x86_64-apple-darwin1340 (64-bit)

                    Running under macOS Sierra 10124

                    Matrix products default

                    BLAS LibraryFrameworksRframeworkVersions34ResourcesliblibRblas0dylib

                    LAPACK LibraryFrameworksRframeworkVersions34ResourcesliblibRlapackdylib

                    locale

                    [1] C

                    attached base packages

                    [1] stats graphics grDevices utils datasets methods base

                    other attached packages

                    [1] psych_17421

                    loaded via a namespace (and not attached)

                    [1] compiler_340 parallel_340 tools_340 foreign_08-67

                    [5] KernSmooth_223-15 nlme_31-131 mnormt_15-4 grid_340

                    [9] lattice_020-34

                    52

                    References

                    Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

                    Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

                    Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

                    Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

                    Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

                    Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

                    Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

                    Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

                    Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

                    Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

                    Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

                    Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

                    Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

                    Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

                    Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

                    53

                    Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

                    Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

                    Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

                    Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

                    Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

                    Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

                    Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

                    Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

                    Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

                    Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

                    MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

                    Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

                    McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

                    Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

                    Nunnally J C (1967) Psychometric theory McGraw-Hill New York

                    54

                    Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

                    3rd edition

                    Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

                    Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

                    Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

                    Revelle W (2015) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 158

                    Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

                    Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

                    Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

                    Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

                    Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook ofIndividual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

                    Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

                    Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

                    Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

                    Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bang

                    55

                    for the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

                    Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

                    Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

                    Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

                    Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

                    Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

                    Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

                    Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

                    Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

                    Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

                    Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

                    Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

                    56

                    Index

                    affect 14 24alpha 5 6

                    Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26bifactor 6biserial 13 34blockrandom 49burt 34

                    char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49circular statistics 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49ctv 7cubits 50

                    densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49diagram 8drawcor 34drawtetra 34dummycode 13

                    dynamite plot 19

                    edit 3epibfi 50error bars 19errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23

                    fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factor analysis 6factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49

                    galton 51generalized least squares 6geometricmean 49GPArotation 7guttman 6

                    harmonicmean 49head 49headtail 49heights 50hetdiagram 7Hmisc 28Holzinger 50

                    57

                    ICC 6 49iclust 6iclustdiagram 7Index 49introduction to psychometric theory with ap-

                    plications in R 7iqitems 50irtfa 6 47irt2latex 47 49

                    KnitR 47

                    lavaan 38library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27lowess 14

                    makekeys 14MAP 6mardia 49maximum likelihood 6mediate 4 41 42mediatediagram 41minimum residual 6mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevel 37multilevelreliability 6multiple regression 38

                    nfactors 6nlme 37

                    omega 6 7outlier 3 11 12

                    padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7principal axis 6psych 3 5ndash8 28 47 49ndash52

                    R functionaffect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50

                    58

                    densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13edit 3epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49head 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47

                    irt2latex 47 49library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7psych 51psych package

                    affect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26

                    59

                    biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49corsmooth 34cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7

                    fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47irt2latex 47 49lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12prep 49pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34

                    60

                    polyserial 34principal 5ndash7psych 51rtest 28rangeCorrection 49readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

                    rtest 28

                    rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50table 19tail 49tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

                    R package

                    61

                    ctv 7GPArotation 7Hmisc 28KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash8 28 47 49ndash52Rgraphviz 8sem 7 50stats 28Sweave 47xtable 47

                    rtest 28rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49Rgraphviz 8

                    SAPA 26 50 51satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43sem 7 50set correlation 44setcor 49setCor 4 38 41 44 46 47simmultilevel 37

                    spider 13stars 14stats 28StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50Sweave 47

                    table 19tail 49tetrachoric 6 34Thurstone 28 38 50Thurstone33 50topBottom 49

                    vegetables 50 51violinBy 14 18vss 5 6

                    weighted least squares 6withinBetween 37

                    xtable 47

                    62

                    • Jump starting the psych packagendasha guide for the impatient
                    • Psychometric functions are summarized in the second vignette
                    • Overview of this and related documents
                    • Getting started
                    • Basic data analysis
                      • Getting the data by using readfile
                      • Data input from the clipboard
                      • Basic descriptive statistics
                        • Outlier detection using outlier
                        • Basic data cleaning using scrub
                        • Recoding categorical variables into dummy coded variables
                          • Simple descriptive graphics
                            • Scatter Plot Matrices
                            • Density or violin plots
                            • Means and error bars
                            • Error bars for tabular data
                            • Two dimensional displays of means and errors
                            • Back to back histograms
                            • Correlational structure
                            • Heatmap displays of correlational structure
                              • Testing correlations
                              • Polychoric tetrachoric polyserial and biserial correlations
                                • Multilevel modeling
                                  • Decomposing data into within and between level correlations using statsBy
                                  • Generating and displaying multilevel data
                                  • Factor analysis by groups
                                    • Multiple Regression mediation moderation and set correlations
                                      • Multiple regression from data or correlation matrices
                                      • Mediation and Moderation analysis
                                      • Set Correlation
                                        • Converting output to APA style tables using LaTeX
                                        • Miscellaneous functions
                                        • Data sets
                                        • Development version and a users guide
                                        • Psychometric Theory
                                        • SessionInfo

                      education 2 247 300 154 010

                      age 3 247 2586 974 062

                      ACT 4 247 2879 506 032

                      SATV 5 247 61511 11416 726

                      SATQ 6 245 63587 11602 741

                      ------------------------------------------------------------

                      group 2

                      vars n mean sd se

                      gender 1 453 200 000 000

                      education 2 453 326 135 006

                      age 3 453 2545 937 044

                      ACT 4 453 2842 469 022

                      SATV 5 453 61066 11231 528

                      SATQ 6 442 59600 11307 538

                      The output from the describeBy function can be forced into a matrix form for easy analysisby other programs In addition describeBy can group by several grouping variables at thesame time

                      gt samat lt- describeBy(satactlist(satact$gendersatact$education)

                      + skew=FALSEranges=FALSEmat=TRUE)

                      gt headTail(samat)

                      item group1 group2 vars n mean sd se

                      gender1 1 1 0 1 27 1 0 0

                      gender2 2 2 0 1 30 2 0 0

                      gender3 3 1 1 1 20 1 0 0

                      gender4 4 2 1 1 25 2 0 0

                      ltNAgt ltNAgt ltNAgt

                      SATQ9 69 1 4 6 51 6359 10412 1458

                      SATQ10 70 2 4 6 86 59759 10624 1146

                      SATQ11 71 1 5 6 46 65783 8961 1321

                      SATQ12 72 2 5 6 93 60672 10555 1095

                      331 Outlier detection using outlier

                      One way to detect unusual data is to consider how far each data point is from the mul-tivariate centroid of the data That is find the squared Mahalanobis distance for eachdata point and then compare these to the expected values of χ2 This produces a Q-Q(quantle-quantile) plot with the n most extreme data points labeled (Figure 1) The outliervalues are in the vector d2

                      332 Basic data cleaning using scrub

                      If after describing the data it is apparent that there were data entry errors that need tobe globally replaced with NA or only certain ranges of data will be analyzed the data canbe ldquocleanedrdquo using the scrub function

                      Consider a data set of 10 rows of 12 columns with values from 1 - 120 All values of columns

                      11

                      gt png( outlierpng )

                      gt d2 lt- outlier(satactcex=8)

                      gt devoff()

                      null device

                      1

                      Figure 1 Using the outlier function to graphically show outliers The y axis is theMahalanobis D2 the X axis is the distribution of χ2 for the same number of degrees offreedom The outliers detected here may be shown graphically using pairspanels (see2 and may be found by sorting d2

                      12

                      3 - 5 that are less than 30 40 or 50 respectively or greater than 70 in any of the threecolumns will be replaced with NA In addition any value exactly equal to 45 will be setto NA (max and isvalue are set to one value here but they could be a different value forevery column)

                      gt x lt- matrix(1120ncol=10byrow=TRUE)

                      gt colnames(x) lt- paste(V110sep=)gt newx lt- scrub(x35min=c(304050)max=70isvalue=45newvalue=NA)

                      gt newx

                      V1 V2 V3 V4 V5 V6 V7 V8 V9 V10

                      [1] 1 2 NA NA NA 6 7 8 9 10

                      [2] 11 12 NA NA NA 16 17 18 19 20

                      [3] 21 22 NA NA NA 26 27 28 29 30

                      [4] 31 32 33 NA NA 36 37 38 39 40

                      [5] 41 42 43 44 NA 46 47 48 49 50

                      [6] 51 52 53 54 55 56 57 58 59 60

                      [7] 61 62 63 64 65 66 67 68 69 70

                      [8] 71 72 NA NA NA 76 77 78 79 80

                      [9] 81 82 NA NA NA 86 87 88 89 90

                      [10] 91 92 NA NA NA 96 97 98 99 100

                      [11] 101 102 NA NA NA 106 107 108 109 110

                      [12] 111 112 NA NA NA 116 117 118 119 120

                      Note that the number of subjects for those columns has decreased and the minimums havegone up but the maximums down Data cleaning and examination for outliers should be aroutine part of any data analysis

                      333 Recoding categorical variables into dummy coded variables

                      Sometimes categorical variables (eg college major occupation ethnicity) are to be ana-lyzed using correlation or regression To do this one can form ldquodummy codesrdquo which aremerely binary variables for each category This may be done using dummycode Subse-quent analyses using these dummy coded variables may be using biserial or point biserial(regular Pearson r) to show effect sizes and may be plotted in eg spider plots

                      Alternatively sometimes data were coded originally as categorical (MaleFemale HighSchool some College in college etc) and you want to convert these columns of data tonumeric This is done by char2numeric

                      34 Simple descriptive graphics

                      Graphic descriptions of data are very helpful both for understanding the data as well ascommunicating important results Scatter Plot Matrices (SPLOMS) using the pairspanelsfunction are useful ways to look for strange effects involving outliers and non-linearitieserrorbarsby will show group means with 95 confidence boundaries By default er-rorbarsby and errorbars will show ldquocats eyesrdquo to graphically show the confidence

                      13

                      limits (Figure 6) This may be turned off by specifying eyes=FALSE densityBy or vio-

                      linBy may be used to show the distribution of the data in ldquoviolinrdquo plots (Figure 5) (Theseare sometimes called ldquolava-lamprdquo plots)

                      341 Scatter Plot Matrices

                      Scatter Plot Matrices (SPLOMS) are very useful for describing the data The pairspanelsfunction adapted from the help menu for the pairs function produces xy scatter plots ofeach pair of variables below the diagonal shows the histogram of each variable on thediagonal and shows the lowess locally fit regression line as well An ellipse around themean with the axis length reflecting one standard deviation of the x and y variables is alsodrawn The x axis in each scatter plot represents the column variable the y axis the rowvariable (Figure 2) When plotting many subjects it is both faster and cleaner to set theplot character (pch) to be rsquorsquo (See Figure 2 for an example)

                      pairspanels will show the pairwise scatter plots of all the variables as well as his-tograms locally smoothed regressions and the Pearson correlation When plottingmany data points (as in the case of the satact data it is possible to specify that theplot character is a period to get a somewhat cleaner graphic However in this figureto show the outliers we use colors and a larger plot character If we want to indicatersquosignificancersquo of the correlations by the conventional use of rsquomagic astricksrsquo we can setthe stars=TRUE option

                      Another example of pairspanels is to show differences between experimental groupsConsider the data in the affect data set The scores reflect post test scores on positiveand negative affect and energetic and tense arousal The colors show the results for fourmovie conditions depressing frightening movie neutral and a comedy

                      Yet another demonstration of pairspanels is useful when you have many subjects andwant to show the density of the distributions To do this we will use the makekeys

                      and scoreItems functions (discussed in the second vignette) to create scales measuringEnergetic Arousal Tense Arousal Positive Affect and Negative Affect (see the msq helpfile) We then show a pairspanels scatter plot matrix where we smooth the data pointsand show the density of the distribution by color

                      342 Density or violin plots

                      Graphical presentation of data may be shown using box plots to show the median and 25thand 75th percentiles A powerful alternative is to show the density distribution using theviolinBy function (Figure 5)

                      14

                      gt png( pairspanelspng )

                      gt satd2 lt- dataframe(satactd2) combine the d2 statistics from before with the satact dataframe

                      gt pairspanels(satd2bg=c(yellowblue)[(d2 gt 25)+1]pch=21stars=TRUE)

                      gt devoff()

                      null device

                      1

                      Figure 2 Using the pairspanels function to graphically show relationships The x axisin each scatter plot represents the column variable the y axis the row variable Note theextreme outlier for the ACT If the plot character were set to a period (pch=rsquorsquo) it wouldmake a cleaner graphic but in to show the outliers in color we use the plot characters 21and 22

                      15

                      gt png(affectpng)gt pairspanels(affect[1417]bg=c(redblackwhiteblue)[affect$Film]pch=21

                      + main=Affect varies by movies )

                      gt devoff()

                      null device

                      1

                      Figure 3 Using the pairspanels function to graphically show relationships The x axis ineach scatter plot represents the column variable the y axis the row variable The coloringrepresent four different movie conditions

                      16

                      gt keys lt- makekeys(msq[175]list(

                      + EA = c(active energetic vigorous wakeful wideawake fullofpep

                      + lively -sleepy -tired -drowsy)

                      + TA =c(intense jittery fearful tense clutchedup -quiet -still

                      + -placid -calm -atrest)

                      + PA =c(active excited strong inspired determined attentive

                      + interested enthusiastic proud alert)

                      + NAf =c(jittery nervous scared afraid guilty ashamed distressed

                      + upset hostile irritable )) )

                      gt scores lt- scoreItems(keysmsq[175])

                      gt png(msqpng)gt pairspanels(scores$scoressmoother=TRUE

                      + main =Density distributions of four measures of affect )

                      gt devoff()

                      null device

                      1

                      Figure 4 Using the pairspanels function to graphically show relationships The x axis ineach scatter plot represents the column variable the y axis the row variable The variablesare four measures of motivational state for 3896 participants Each scale is the averagescore of 10 items measuring motivational state Compare this a plot with smoother set toFALSE

                      17

                      gt data(satact)

                      gt violinBy(satact[56]satact$gendergrpname=c(M F)main=Density Plot by gender for SAT V and Q)

                      Density Plot by gender for SAT V and Q

                      Obs

                      erve

                      d

                      SATV M SATV F SATQ M SATQ F

                      200

                      300

                      400

                      500

                      600

                      700

                      800

                      Figure 5 Using the violinBy function to show the distribution of SAT V and Q for malesand females The plot shows the medians and 25th and 75th percentiles as well as theentire range and the density distribution

                      18

                      343 Means and error bars

                      Additional descriptive graphics include the ability to draw error bars on sets of data aswell as to draw error bars in both the x and y directions for paired data These are thefunctions errorbars errorbarsby errorbarstab and errorcrosses

                      errorbars show the 95 confidence intervals for each variable in a data frame or ma-trix These errors are based upon normal theory and the standard errors of the meanAlternative options include +- one standard deviation or 1 standard error If thedata are repeated measures the error bars will be reflect the between variable cor-relations By default the confidence intervals are displayed using a ldquocats eyesrdquo plotwhich emphasizes the distribution of confidence within the confidence interval

                      errorbarsby does the same but grouping the data by some condition

                      errorbarstab draws bar graphs from tabular data with error bars based upon thestandard error of proportion (σp =

                      radicpqN)

                      errorcrosses draw the confidence intervals for an x set and a y set of the same size

                      The use of the errorbarsby function allows for graphic comparisons of different groups(see Figure 6) Five personality measures are shown as a function of high versus low scoreson a ldquolierdquo scale People with higher lie scores tend to report being more agreeable consci-entious and less neurotic than people with lower lie scores The error bars are based uponnormal theory and thus are symmetric rather than reflect any skewing in the data

                      Although not recommended it is possible to use the errorbars function to draw bargraphs with associated error bars (This kind of dynamite plot (Figure 8) can be verymisleading in that the scale is arbitrary Go to a discussion of the problems in presentingdata this way at httpemdbolkerwikidotcomblogdynamite In the example shownnote that the graph starts at 0 although is out of the range This is a function of usingbars which always are assumed to start at zero Consider other ways of showing yourdata

                      344 Error bars for tabular data

                      However it is sometimes useful to show error bars for tabular data either found by thetable function or just directly input These may be found using the errorbarstab

                      function

                      19

                      gt data(epibfi)

                      gt errorbarsby(epibfi[610]epibfi$epilielt4)

                      095 confidence limits

                      Independent Variable

                      Dep

                      ende

                      nt V

                      aria

                      ble

                      bfagree bfcon bfext bfneur bfopen

                      050

                      100

                      150

                      Figure 6 Using the errorbarsby function shows that self reported personality scales onthe Big Five Inventory vary as a function of the Lie scale on the EPI The ldquocats eyesrdquo showthe distribution of the confidence

                      20

                      gt errorbarsby(satact[56]satact$genderbars=TRUE

                      + labels=c(MaleFemale)ylab=SAT scorexlab=)

                      Male Female

                      095 confidence limits

                      SAT

                      sco

                      re

                      200

                      300

                      400

                      500

                      600

                      700

                      800

                      200

                      300

                      400

                      500

                      600

                      700

                      800

                      Figure 7 A ldquoDynamite plotrdquo of SAT scores as a function of gender is one way of misleadingthe reader By using a bar graph the range of scores is ignored Bar graphs start from 0

                      21

                      gt T lt- with(satacttable(gendereducation))

                      gt rownames(T) lt- c(MF)

                      gt errorbarstab(Tway=bothylab=Proportion of Education Levelxlab=Level of Education

                      + main=Proportion of sample by education level)

                      Proportion of sample by education level

                      Level of Education

                      Pro

                      port

                      ion

                      of E

                      duca

                      tion

                      Leve

                      l

                      000

                      005

                      010

                      015

                      020

                      025

                      030

                      M 0 M 1 M 2 M 3 M 4 M 5

                      000

                      005

                      010

                      015

                      020

                      025

                      030

                      Figure 8 The proportion of each education level that is Male or Female By using theway=rdquobothrdquo option the percentages and errors are based upon the grand total Alterna-tively way=rdquocolumnsrdquo finds column wise percentages way=rdquorowsrdquo finds rowwise percent-ages The data can be converted to percentages (as shown) or by total count (raw=TRUE)The function invisibly returns the probabilities and standard errors See the help menu foran example of entering the data as a dataframe

                      22

                      345 Two dimensional displays of means and errors

                      Yet another way to display data for different conditions is to use the errorCrosses func-tion For instance the effect of various movies on both ldquoEnergetic Arousalrdquo and ldquoTenseArousalrdquo can be seen in one graph and compared to the same movie manipulations onldquoPositive Affectrdquo and ldquoNegative Affectrdquo Note how Energetic Arousal is increased by threeof the movie manipulations but that Positive Affect increases following the Happy movieonly

                      23

                      gt op lt- par(mfrow=c(12))

                      gt data(affect)

                      gt colors lt- c(blackredwhiteblue)

                      gt films lt- c(SadHorrorNeutralHappy)

                      gt affectstats lt- errorCircles(EA2TA2data=affect[-c(120)]group=Filmlabels=films

                      + xlab=Energetic Arousal ylab=Tense Arousalylim=c(1022)xlim=c(820)pch=16

                      + cex=2colors=colors main = Movies effect on arousal)gt errorCircles(PA2NA2data=affectstatslabels=filmsxlab=Positive Affect

                      + ylab=Negative Affect pch=16cex=2colors=colors main =Movies effect on affect)

                      gt op lt- par(mfrow=c(11))

                      8 12 16 20

                      1012

                      1416

                      1820

                      22

                      Movies effect on arousal

                      Energetic Arousal

                      Tens

                      e A

                      rous

                      al

                      SadHorror

                      NeutralHappy

                      6 8 10 12

                      24

                      68

                      10

                      Movies effect on affect

                      Positive Affect

                      Neg

                      ativ

                      e A

                      ffect

                      Sad

                      Horror

                      NeutralHappy

                      Figure 9 The use of the errorCircles function allows for two dimensional displays ofmeans and error bars The first call to errorCircles finds descriptive statistics for theaffect dataframe based upon the grouping variable of Film These data are returned andthen used by the second call which examines the effect of the same grouping variable upondifferent measures The size of the circles represent the relative sample sizes for each groupThe data are from the PMC lab and reported in Smillie et al (2012)

                      24

                      346 Back to back histograms

                      The bibars function summarize the characteristics of two groups (eg males and females)on a second variable (eg age) by drawing back to back histograms (see Figure 10)

                      25

                      data(bfi)gt png( bibarspng )

                      gt with(bfibibars(agegenderylab=Agemain=Age by males and females))

                      gt devoff()

                      null device

                      1

                      Figure 10 A bar plot of the age distribution for males and females shows the use ofbibars The data are males and females from 2800 cases collected using the SAPAprocedure and are available as part of the bfi data set

                      26

                      347 Correlational structure

                      There are many ways to display correlations Tabular displays are probably the mostcommon The output from the cor function in core R is a rectangular matrix lowerMat

                      will round this to (2) digits and then display as a lower off diagonal matrix lowerCor

                      calls cor with use=lsquopairwisersquo method=lsquopearsonrsquo as default values and returns (invisibly)the full correlation matrix and displays the lower off diagonal matrix

                      gt lowerCor(satact)

                      gendr edctn age ACT SATV SATQ

                      gender 100

                      education 009 100

                      age -002 055 100

                      ACT -004 015 011 100

                      SATV -002 005 -004 056 100

                      SATQ -017 003 -003 059 064 100

                      When comparing results from two different groups it is convenient to display them as onematrix with the results from one group below the diagonal and the other group above thediagonal Use lowerUpper to do this

                      gt female lt- subset(satactsatact$gender==2)

                      gt male lt- subset(satactsatact$gender==1)

                      gt lower lt- lowerCor(male[-1])

                      edctn age ACT SATV SATQ

                      education 100

                      age 061 100

                      ACT 016 015 100

                      SATV 002 -006 061 100

                      SATQ 008 004 060 068 100

                      gt upper lt- lowerCor(female[-1])

                      edctn age ACT SATV SATQ

                      education 100

                      age 052 100

                      ACT 016 008 100

                      SATV 007 -003 053 100

                      SATQ 003 -009 058 063 100

                      gt both lt- lowerUpper(lowerupper)

                      gt round(both2)

                      education age ACT SATV SATQ

                      education NA 052 016 007 003

                      age 061 NA 008 -003 -009

                      ACT 016 015 NA 053 058

                      SATV 002 -006 061 NA 063

                      SATQ 008 004 060 068 NA

                      It is also possible to compare two matrices by taking their differences and displaying one (be-low the diagonal) and the difference of the second from the first above the diagonal

                      27

                      gt diffs lt- lowerUpper(lowerupperdiff=TRUE)

                      gt round(diffs2)

                      education age ACT SATV SATQ

                      education NA 009 000 -005 005

                      age 061 NA 007 -003 013

                      ACT 016 015 NA 008 002

                      SATV 002 -006 061 NA 005

                      SATQ 008 004 060 068 NA

                      348 Heatmap displays of correlational structure

                      Perhaps a better way to see the structure in a correlation matrix is to display a heat mapof the correlations This is just a matrix color coded to represent the magnitude of thecorrelation This is useful when considering the number of factors in a data set Considerthe Thurstone data set which has a clear 3 factor solution (Figure 11) or a simulated dataset of 24 variables with a circumplex structure (Figure 12) The color coding representsa ldquoheat maprdquo of the correlations with darker shades of red representing stronger negativeand darker shades of blue stronger positive correlations As an option the value of thecorrelation can be shown

                      Yet another way to show structure is to use ldquospiderrdquo plots Particularly if variables areordered in some meaningful way (eg in a circumplex) a spider plot will show this structureeasily This is just a plot of the magnitude of the correlation as a radial line with lengthranging from 0 (for a correlation of -1) to 1 (for a correlation of 1) (See Figure 13)

                      35 Testing correlations

                      Correlations are wonderful descriptive statistics of the data but some people like to testwhether these correlations differ from zero or differ from each other The cortest func-tion (in the stats package) will test the significance of a single correlation and the rcorr

                      function in the Hmisc package will do this for many correlations In the psych packagethe corrtest function reports the correlation (Pearson Spearman or Kendall) betweenall variables in either one or two data frames or matrices as well as the number of obser-vations for each case and the (two-tailed) probability for each correlation Unfortunatelythese probability values have not been corrected for multiple comparisons and so shouldbe taken with a great deal of salt Thus in corrtest and corrp the raw probabilitiesare reported below the diagonal and the probabilities adjusted for multiple comparisonsusing (by default) the Holm correction are reported above the diagonal (Table 1) (See thepadjust function for a discussion of Holm (1979) and other corrections)

                      Testing the difference between any two correlations can be done using the rtest functionThe function actually does four different tests (based upon an article by Steiger (1980)

                      28

                      gt png(corplotpng)gt corPlot(Thurstonenumbers=TRUEupper=FALSEdiag=FALSEmain=9 cognitive variables from Thurstone)

                      gt devoff()

                      null device

                      1

                      Figure 11 The structure of correlation matrix can be seen more clearly if the variables aregrouped by factor and then the correlations are shown by color By using the rsquonumbersrsquooption the values are displayed as well By default the complete matrix is shown Settingupper=FALSE and diag=FALSE shows a cleaner figure

                      29

                      gt png(circplotpng)gt circ lt- simcirc(24)

                      gt rcirc lt- cor(circ)

                      gt corPlot(rcircmain=24 variables in a circumplex)gt devoff()

                      null device

                      1

                      Figure 12 Using the corPlot function to show the correlations in a circumplex Correlationsare highest near the diagonal diminish to zero further from the diagonal and the increaseagain towards the corners of the matrix Circumplex structures are common in the studyof affect For circumplex structures it is perhaps useful to show the complete matrix

                      30

                      gt png(spiderpng)gt oplt- par(mfrow=c(22))

                      gt spider(y=c(161218)x=124data=rcircfill=TRUEmain=Spider plot of 24 circumplex variables)

                      gt op lt- par(mfrow=c(11))

                      gt devoff()

                      null device

                      1

                      Figure 13 A spider plot can show circumplex structure very clearly Circumplex structuresare common in the study of affect

                      31

                      Table 1 The corrtest function reports correlations cell sizes and raw and adjustedprobability values corrp reports the probability values for a correlation matrix Bydefault the adjustment used is that of Holm (1979)gt corrtest(satact)

                      Callcorrtest(x = satact)

                      Correlation matrix

                      gender education age ACT SATV SATQ

                      gender 100 009 -002 -004 -002 -017

                      education 009 100 055 015 005 003

                      age -002 055 100 011 -004 -003

                      ACT -004 015 011 100 056 059

                      SATV -002 005 -004 056 100 064

                      SATQ -017 003 -003 059 064 100

                      Sample Size

                      gender education age ACT SATV SATQ

                      gender 700 700 700 700 700 687

                      education 700 700 700 700 700 687

                      age 700 700 700 700 700 687

                      ACT 700 700 700 700 700 687

                      SATV 700 700 700 700 700 687

                      SATQ 687 687 687 687 687 687

                      Probability values (Entries above the diagonal are adjusted for multiple tests)

                      gender education age ACT SATV SATQ

                      gender 000 017 100 100 1 0

                      education 002 000 000 000 1 1

                      age 058 000 000 003 1 1

                      ACT 033 000 000 000 0 0

                      SATV 062 022 026 000 0 0

                      SATQ 000 036 037 000 0 0

                      To see confidence intervals of the correlations print with the short=FALSE option

                      32

                      depending upon the input

                      1) For a sample size n find the t and p value for a single correlation as well as the confidenceinterval

                      gt rtest(503)

                      Correlation tests

                      Callrtest(n = 50 r12 = 03)

                      Test of significance of a correlation

                      t value 218 with probability lt 0034

                      and confidence interval 002 053

                      2) For sample sizes of n and n2 (n2 = n if not specified) find the z of the difference betweenthe z transformed correlations divided by the standard error of the difference of two zscores

                      gt rtest(3046)

                      Correlation tests

                      Callrtest(n = 30 r12 = 04 r34 = 06)

                      Test of difference between two independent correlations

                      z value 099 with probability 032

                      3) For sample size n and correlations ra= r12 rb= r23 and r13 specified test for thedifference of two dependent correlations (Steiger case A)

                      gt rtest(103451)

                      Correlation tests

                      Call[1] rtest(n = 103 r12 = 04 r23 = 01 r13 = 05 )

                      Test of difference between two correlated correlations

                      t value -089 with probability lt 037

                      4) For sample size n test for the difference between two dependent correlations involvingdifferent variables (Steiger case B)

                      gt rtest(103567558) steiger Case B

                      Correlation tests

                      Callrtest(n = 103 r12 = 05 r34 = 06 r23 = 07 r13 = 05 r14 = 05

                      r24 = 08)

                      Test of difference between two dependent correlations

                      z value -12 with probability 023

                      To test whether a matrix of correlations differs from what would be expected if the popu-lation correlations were all zero the function cortest follows Steiger (1980) who pointedout that the sum of the squared elements of a correlation matrix or the Fisher z scoreequivalents is distributed as chi square under the null hypothesis that the values are zero(ie elements of the identity matrix) This is particularly useful for examining whethercorrelations in a single matrix differ from zero or for comparing two matrices Althoughobvious cortest can be used to test whether the satact data matrix produces non-zerocorrelations (it does) This is a much more appropriate test when testing whether a residualmatrix differs from zero

                      gt cortest(satact)

                      33

                      Tests of correlation matrices

                      Callcortest(R1 = satact)

                      Chi Square value 132542 with df = 15 with probability lt 18e-273

                      36 Polychoric tetrachoric polyserial and biserial correlations

                      The Pearson correlation of dichotomous data is also known as the φ coefficient If thedata eg ability items are thought to represent an underlying continuous although latentvariable the φ will underestimate the value of the Pearson applied to these latent variablesOne solution to this problem is to use the tetrachoric correlation which is based uponthe assumption of a bivariate normal distribution that has been cut at certain points Thedrawtetra function demonstrates the process (Figure 14) This is also shown in termsof dichotomizing the bivariate normal density function using the drawcor function (Fig-ure 15) A simple generalization of this to the case of the multiple cuts is the polychoric

                      correlation

                      Other estimated correlations based upon the assumption of bivariate normality with cutpoints include the biserial and polyserial correlation

                      If the data are a mix of continuous polytomous and dichotomous variables the mixedcor

                      function will calculate the appropriate mixture of Pearson polychoric tetrachoric biserialand polyserial correlations

                      The correlation matrix resulting from a number of tetrachoric or polychoric correlationmatrix sometimes will not be positive semi-definite This will sometimes happen if thecorrelation matrix is formed by using pair-wise deletion of cases The corsmooth functionwill adjust the smallest eigen values of the correlation matrix to make them positive rescaleall of them to sum to the number of variables and produce aldquosmoothedrdquocorrelation matrixAn example of this problem is a data set of burt which probably had a typo in the originalcorrelation matrix Smoothing the matrix corrects this problem

                      4 Multilevel modeling

                      Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular use

                      34

                      gt drawtetra()

                      minus3 minus2 minus1 0 1 2 3

                      minus3

                      minus2

                      minus1

                      01

                      23

                      Y rho = 05phi = 033

                      X gt τY gt Τ

                      X lt τY gt Τ

                      X gt τY lt Τ

                      X lt τY lt Τ

                      x

                      dnor

                      m(x

                      )

                      X gt τ

                      τ

                      x1

                      Y gt Τ

                      Τ

                      Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values

                      35

                      gt drawcor(expand=20cuts=c(00))

                      xy

                      z

                      Bivariate density rho = 05

                      Figure 15 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values It isfound (laboriously) by optimizing the fit of the bivariate normal for various values of thecorrelation to the observed cell frequencies

                      36

                      is the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups

                      41 Decomposing data into within and between level correlations usingstatsBy

                      There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler functionto give some of the basic descriptive statistics for two level models

                      This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package

                      rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)

                      where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means

                      42 Generating and displaying multilevel data

                      withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a betweengroup correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data

                      simmultilevel will generate simulated data with a multilevel structure

                      The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary

                      function specifying the variable of interest

                      37

                      Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )

                      43 Factor analysis by groups

                      Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output

                      sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)

                      faBy(sbnfactors=5) find the 5 factor solution for each education level

                      5 Multiple Regression mediation moderation and set cor-relations

                      The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices

                      51 Multiple regression from data or correlation matrices

                      The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstonecorrelation matrix and find the multiple correlation of the last five variables as a functionof the first 4

                      gt setCor(y = 59x=14data=Thurstone)

                      Call setCor(y = 59 x = 14 data = Thurstone)

                      Multiple Regression from matrix input

                      Beta weights

                      FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

                      Sentences 009 007 025 021 020

                      Vocabulary 009 017 009 016 -002

                      SentCompletion 002 005 004 021 008

                      FirstLetters 058 045 021 008 031

                      38

                      Multiple R

                      FourLetterWords Suffixes LetterSeries Pedigrees

                      069 063 050 058

                      LetterGroup

                      048

                      multiple R2

                      FourLetterWords Suffixes LetterSeries Pedigrees

                      048 040 025 034

                      LetterGroup

                      023

                      Multiple Inflation Factor (VIF) = 1(1-SMC) =

                      Sentences Vocabulary SentCompletion FirstLetters

                      369 388 300 135

                      Unweighted multiple R

                      FourLetterWords Suffixes LetterSeries Pedigrees

                      059 058 049 058

                      LetterGroup

                      045

                      Unweighted multiple R2

                      FourLetterWords Suffixes LetterSeries Pedigrees

                      034 034 024 033

                      LetterGroup

                      020

                      Various estimates of between set correlations

                      Squared Canonical Correlations

                      [1] 06280 01478 00076 00049

                      Average squared canonical correlation = 02

                      Cohens Set Correlation R2 = 069

                      Unweighted correlation between the two sets = 073

                      By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed

                      gt sc lt- setCor(y = 59x=34data=Thurstonez=12)

                      Call setCor(y = 59 x = 34 data = Thurstone z = 12)

                      Multiple Regression from matrix input

                      Beta weights

                      FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

                      SentCompletion 002 005 004 021 008

                      FirstLetters 058 045 021 008 031

                      Multiple R

                      FourLetterWords Suffixes LetterSeries Pedigrees

                      058 046 021 018

                      LetterGroup

                      030

                      39

                      multiple R2

                      FourLetterWords Suffixes LetterSeries Pedigrees

                      0331 0210 0043 0032

                      LetterGroup

                      0092

                      Multiple Inflation Factor (VIF) = 1(1-SMC) =

                      SentCompletion FirstLetters

                      102 102

                      Unweighted multiple R

                      FourLetterWords Suffixes LetterSeries Pedigrees

                      044 035 017 014

                      LetterGroup

                      026

                      Unweighted multiple R2

                      FourLetterWords Suffixes LetterSeries Pedigrees

                      019 012 003 002

                      LetterGroup

                      007

                      Various estimates of between set correlations

                      Squared Canonical Correlations

                      [1] 0405 0023

                      Average squared canonical correlation = 021

                      Cohens Set Correlation R2 = 042

                      Unweighted correlation between the two sets = 048

                      gt round(sc$residual2)

                      FourLetterWords Suffixes LetterSeries Pedigrees

                      FourLetterWords 052 011 009 006

                      Suffixes 011 060 -001 001

                      LetterSeries 009 -001 075 028

                      Pedigrees 006 001 028 066

                      LetterGroup 013 003 037 020

                      LetterGroup

                      FourLetterWords 013

                      Suffixes 003

                      LetterSeries 037

                      Pedigrees 020

                      LetterGroup 077

                      52 Mediation and Moderation analysis

                      Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping

                      40

                      Consider the example from Preacher and Hayes (2004) as analyzed using the mediate

                      function and the subsequent graphic from mediatediagram The data are found in theexample for mediate

                      Call mediate(y = SATIS x = THERAPY m = ATTRIB data = sobel)

                      The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB

                      Total Direct effect(c) of THERAPY on SATIS = 076 SE = 031 t direct = 25 with probability = 0019

                      Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t direct = 135 with probability = 019

                      Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033

                      Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 004 Upper CI = 069

                      R2 of model = 031

                      To see the longer output specify short = FALSE in the print statement

                      Full output

                      Total effect estimates (c)

                      SATIS se t Prob

                      THERAPY 076 031 25 00186

                      Direct effect estimates (c)SATIS se t Prob

                      THERAPY 043 032 135 0190

                      ATTRIB 040 018 223 0034

                      a effect estimates

                      THERAPY se t Prob

                      ATTRIB 082 03 274 00106

                      b effect estimates

                      SATIS se t Prob

                      ATTRIB 04 018 223 0034

                      ab effect estimates

                      SATIS boot sd lower upper

                      THERAPY 033 032 017 004 069

                      bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables

                      setCor(y = c( SATV SATQ) x = c(education age ) data = satact std=TRUE)

                      bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap

                      mediate(y = c( SATV ) x = c(education age ) m= ACT data =satactstd=TRUEniter=50)

                      bull mediate will take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It then tests the mediation effect using a bootstrap The particular example is for demonstration purposes only and shows neithermoderation nor mediation The number of iterations for the boot strap was set to 50

                      41

                      gt mediatediagram(preacher)

                      Mediation model

                      THERAPY SATIS

                      ATTRIB

                      082

                      c = 076

                      c = 043

                      04

                      Figure 16 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram

                      42

                      gt preacher lt- setCor(1c(23)sobelstd=FALSE)

                      gt setCordiagram(preacher)

                      Regression Models

                      THERAPY

                      ATTRIB

                      SATIS

                      043

                      04

                      021

                      Figure 17 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure

                      43

                      for speed The default number of boot straps is 5000

                      53 Set Correlation

                      An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor

                      function Set correlation is

                      R2 = 1minusn

                      prodi=1

                      (1minusλi)

                      where λi is the ith eigen value of the eigen value decomposition of the matrix

                      R = Rminus1xx RxyRminus1

                      xx Rminus1xy

                      Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate

                      setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data

                      Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things tonotice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied

                      For this example the analysis is done on the correlation matrix rather than the rawdata

                      gt C lt- cov(satactuse=pairwise)

                      gt model1 lt- lm(ACT~ gender + education + age data=satact)

                      gt summary(model1)

                      Call

                      lm(formula = ACT ~ gender + education + age data = satact)

                      Residuals

                      44

                      Call mediate(y = c(SATQ) x = c(ACT) m = education data = satact

                      mod = gender niter = 50 std = TRUE)

                      The DV (Y) was SATQ The IV (X) was ACT gender ACTXgndr The mediating variable(s) = education

                      Total Direct effect(c) of ACT on SATQ = 058 SE = 003 t direct = 1925 with probability = 0

                      Direct effect (c) of ACT on SATQ removing education = 059 SE = 003 t direct = 1926 with probability = 0

                      Indirect effect (ab) of ACT on SATQ through education = -001

                      Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -002 Upper CI = 0

                      Total Direct effect(c) of gender on SATQ = -014 SE = 003 t direct = -478 with probability = 21e-06

                      Direct effect (c) of gender on NA removing education = -014 SE = 003 t direct = -463 with probability = 44e-06

                      Indirect effect (ab) of gender on SATQ through education = 0

                      Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -001 Upper CI = 0

                      Total Direct effect(c) of ACTXgndr on SATQ = 0 SE = 003 t direct = 002 with probability = 099

                      Direct effect (c) of ACTXgndr on NA removing education = 0 SE = 003 t direct = 001 with probability = 099

                      Indirect effect (ab) of ACTXgndr on SATQ through education = 0

                      Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = 0 Upper CI = 0

                      R2 of model = 037

                      To see the longer output specify short = FALSE in the print statement

                      Full output

                      Total effect estimates (c)

                      SATQ se t Prob

                      ACT 058 003 1925 000e+00

                      gender -014 003 -478 210e-06

                      ACTXgndr 000 003 002 985e-01

                      Direct effect estimates (c)SATQ se t Prob

                      ACT 059 003 1926 000e+00

                      gender -014 003 -463 437e-06

                      ACTXgndr 000 003 001 992e-01

                      a effect estimates

                      education se t Prob

                      ACT 016 004 422 277e-05

                      gender 009 004 250 128e-02

                      ACTXgndr -001 004 -015 883e-01

                      b effect estimates

                      SATQ se t Prob

                      education -004 003 -145 0147

                      ab effect estimates

                      SATQ boot sd lower upper

                      ACT -001 -001 001 0 0

                      gender 000 000 000 0 0

                      ACTXgndr 000 000 000 0 0

                      Moderation model

                      ACT

                      gender

                      ACTXgndr

                      SATQ

                      education016 c = 058

                      c = 059

                      009 c = minus014

                      c = minus014

                      minus001 c = 0

                      c = 0

                      minus004

                      minus004

                      minus007

                      002

                      Figure 18 Moderated multiple regression requires the raw data

                      45

                      Min 1Q Median 3Q Max

                      -252458 -32133 07769 35921 92630

                      Coefficients

                      Estimate Std Error t value Pr(gt|t|)

                      (Intercept) 2741706 082140 33378 lt 2e-16

                      gender -048606 037984 -1280 020110

                      education 047890 015235 3143 000174

                      age 001623 002278 0712 047650

                      ---

                      Signif codes 0 0001 001 005 01 1

                      Residual standard error 4768 on 696 degrees of freedom

                      Multiple R-squared 00272 Adjusted R-squared 002301

                      F-statistic 6487 on 3 and 696 DF p-value 00002476

                      Compare this with the output from setCor

                      gt compare with sector

                      gt setCor(c(46)c(13)C nobs=700)

                      Call setCor(y = c(46) x = c(13) data = C nobs = 700)

                      Multiple Regression from matrix input

                      Beta weights

                      ACT SATV SATQ

                      gender -005 -003 -018

                      education 014 010 010

                      age 003 -010 -009

                      Multiple R

                      ACT SATV SATQ

                      016 010 019

                      multiple R2

                      ACT SATV SATQ

                      00272 00096 00359

                      Multiple Inflation Factor (VIF) = 1(1-SMC) =

                      gender education age

                      101 145 144

                      Unweighted multiple R

                      ACT SATV SATQ

                      015 005 011

                      Unweighted multiple R2

                      ACT SATV SATQ

                      002 000 001

                      SE of Beta weights

                      ACT SATV SATQ

                      gender 018 429 434

                      education 022 513 518

                      age 022 511 516

                      t of Beta Weights

                      ACT SATV SATQ

                      gender -027 -001 -004

                      education 065 002 002

                      46

                      age 015 -002 -002

                      Probability of t lt

                      ACT SATV SATQ

                      gender 079 099 097

                      education 051 098 098

                      age 088 098 099

                      Shrunken R2

                      ACT SATV SATQ

                      00230 00054 00317

                      Standard Error of R2

                      ACT SATV SATQ

                      00120 00073 00137

                      F

                      ACT SATV SATQ

                      649 226 863

                      Probability of F lt

                      ACT SATV SATQ

                      248e-04 808e-02 124e-05

                      degrees of freedom of regression

                      [1] 3 696

                      Various estimates of between set correlations

                      Squared Canonical Correlations

                      [1] 0050 0033 0008

                      Chisq of canonical correlations

                      [1] 358 231 56

                      Average squared canonical correlation = 003

                      Cohens Set Correlation R2 = 009

                      Shrunken Set Correlation R2 = 008

                      F and df of Cohens Set Correlation 726 9 168186

                      Unweighted correlation between the two sets = 001

                      Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

                      6 Converting output to APA style tables using LATEX

                      Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenient

                      47

                      LATEXoutput and finally df2latex converts a generic data frame to LATEX

                      An example of converting the output from fa to LATEXappears in Table 2

                      Table 2 fa2latexA factor analysis table from the psych package in R

                      Variable MR1 MR2 MR3 h2 u2 com

                      Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

                      SS loadings 264 186 15

                      MR1 100 059 054MR2 059 100 052MR3 054 052 100

                      48

                      7 Miscellaneous functions

                      A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

                      blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

                      df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

                      scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

                      cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

                      cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

                      dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

                      fisherz Convert a correlation to the corresponding Fisher z score

                      geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

                      ICC and cohenkappa are typically used to find the reliability for raters

                      headtail combines the head and tail functions to show the first and last lines of a dataset or output

                      topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

                      mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

                      prep finds the probability of replication for an F t or r and estimate effect size

                      partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

                      rangeCorrection will correct correlations for restriction of range

                      reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

                      49

                      superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

                      8 Data sets

                      A number of data sets for demonstrating psychometric techniques are included in thepsych package These include six data sets showing a hierarchical factor structure (fivecognitive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2and one from health psychology Reise) One of these (Thurstone) is used as an examplein the sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) or 13 personality inventory scores (epibfi) and14 multiple choice iq items (iqitems) The vegetables example has paired comparisonpreferences for 9 vegetables This is an example of Thurstonian scaling used by Guilford(1954) and Nunnally (1967) Other data sets include cubits peas and heights fromGalton

                      Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

                      bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

                      satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

                      epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

                      50

                      iq 14 multiple choice ability items were included as part of the Synthetic Aperture Person-ality Assessment (SAPA) web based personality assessment project The data from1000 subjects are included here as a demonstration set for scoring multiple choiceinventories and doing basic item statistics

                      galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to ntroduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

                      Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

                      miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

                      9 Development version and a users guide

                      The most recent development version is available as a source file at the repository main-tained at httppersonality-projectorgr That version will have removed the mostrecently discovered bugs (but perhaps introduced other yet to be discovered ones) Todownload that version go to the repository httppersonality-projectorgrsrc

                      contrib and wander around For a Mac this version can be installed directly using theldquoother repositoryrdquo option in the package installer For a PC the zip file for the most recentrelease has been created using the win-builder facility at CRAN The development releasefor the Mac is usually several weeks ahead of the PC development version

                      Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httppersonality-projectorgrpsych_manualpdf

                      News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

                      gt news(Version gt 170package=psych)

                      51

                      10 Psychometric Theory

                      The psych package has been developed to help psychologists do basic research Many ofthe functions were developed to supplement a book (httppersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep)More information about the use of some of the functions may be found in the book

                      For more extensive discussion of the use of psych in particular and R in general consulthttppersonality-projectorgrrguidehtml A short guide to R

                      11 SessionInfo

                      This document was prepared using the following settings

                      gt sessionInfo()

                      R Under development (unstable) (2017-03-05 r72309)

                      Platform x86_64-apple-darwin1340 (64-bit)

                      Running under macOS Sierra 10124

                      Matrix products default

                      BLAS LibraryFrameworksRframeworkVersions34ResourcesliblibRblas0dylib

                      LAPACK LibraryFrameworksRframeworkVersions34ResourcesliblibRlapackdylib

                      locale

                      [1] C

                      attached base packages

                      [1] stats graphics grDevices utils datasets methods base

                      other attached packages

                      [1] psych_17421

                      loaded via a namespace (and not attached)

                      [1] compiler_340 parallel_340 tools_340 foreign_08-67

                      [5] KernSmooth_223-15 nlme_31-131 mnormt_15-4 grid_340

                      [9] lattice_020-34

                      52

                      References

                      Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

                      Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

                      Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

                      Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

                      Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

                      Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

                      Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

                      Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

                      Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

                      Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

                      Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

                      Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

                      Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

                      Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

                      Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

                      53

                      Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

                      Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

                      Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

                      Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

                      Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

                      Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

                      Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

                      Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

                      Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

                      Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

                      MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

                      Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

                      McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

                      Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

                      Nunnally J C (1967) Psychometric theory McGraw-Hill New York

                      54

                      Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

                      3rd edition

                      Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

                      Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

                      Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

                      Revelle W (2015) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 158

                      Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

                      Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

                      Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

                      Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

                      Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook ofIndividual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

                      Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

                      Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

                      Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

                      Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bang

                      55

                      for the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

                      Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

                      Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

                      Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

                      Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

                      Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

                      Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

                      Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

                      Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

                      Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

                      Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

                      Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

                      56

                      Index

                      affect 14 24alpha 5 6

                      Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26bifactor 6biserial 13 34blockrandom 49burt 34

                      char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49circular statistics 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49ctv 7cubits 50

                      densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49diagram 8drawcor 34drawtetra 34dummycode 13

                      dynamite plot 19

                      edit 3epibfi 50error bars 19errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23

                      fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factor analysis 6factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49

                      galton 51generalized least squares 6geometricmean 49GPArotation 7guttman 6

                      harmonicmean 49head 49headtail 49heights 50hetdiagram 7Hmisc 28Holzinger 50

                      57

                      ICC 6 49iclust 6iclustdiagram 7Index 49introduction to psychometric theory with ap-

                      plications in R 7iqitems 50irtfa 6 47irt2latex 47 49

                      KnitR 47

                      lavaan 38library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27lowess 14

                      makekeys 14MAP 6mardia 49maximum likelihood 6mediate 4 41 42mediatediagram 41minimum residual 6mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevel 37multilevelreliability 6multiple regression 38

                      nfactors 6nlme 37

                      omega 6 7outlier 3 11 12

                      padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7principal axis 6psych 3 5ndash8 28 47 49ndash52

                      R functionaffect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50

                      58

                      densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13edit 3epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49head 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47

                      irt2latex 47 49library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7psych 51psych package

                      affect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26

                      59

                      biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49corsmooth 34cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7

                      fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47irt2latex 47 49lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12prep 49pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34

                      60

                      polyserial 34principal 5ndash7psych 51rtest 28rangeCorrection 49readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

                      rtest 28

                      rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50table 19tail 49tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

                      R package

                      61

                      ctv 7GPArotation 7Hmisc 28KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash8 28 47 49ndash52Rgraphviz 8sem 7 50stats 28Sweave 47xtable 47

                      rtest 28rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49Rgraphviz 8

                      SAPA 26 50 51satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43sem 7 50set correlation 44setcor 49setCor 4 38 41 44 46 47simmultilevel 37

                      spider 13stars 14stats 28StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50Sweave 47

                      table 19tail 49tetrachoric 6 34Thurstone 28 38 50Thurstone33 50topBottom 49

                      vegetables 50 51violinBy 14 18vss 5 6

                      weighted least squares 6withinBetween 37

                      xtable 47

                      62

                      • Jump starting the psych packagendasha guide for the impatient
                      • Psychometric functions are summarized in the second vignette
                      • Overview of this and related documents
                      • Getting started
                      • Basic data analysis
                        • Getting the data by using readfile
                        • Data input from the clipboard
                        • Basic descriptive statistics
                          • Outlier detection using outlier
                          • Basic data cleaning using scrub
                          • Recoding categorical variables into dummy coded variables
                            • Simple descriptive graphics
                              • Scatter Plot Matrices
                              • Density or violin plots
                              • Means and error bars
                              • Error bars for tabular data
                              • Two dimensional displays of means and errors
                              • Back to back histograms
                              • Correlational structure
                              • Heatmap displays of correlational structure
                                • Testing correlations
                                • Polychoric tetrachoric polyserial and biserial correlations
                                  • Multilevel modeling
                                    • Decomposing data into within and between level correlations using statsBy
                                    • Generating and displaying multilevel data
                                    • Factor analysis by groups
                                      • Multiple Regression mediation moderation and set correlations
                                        • Multiple regression from data or correlation matrices
                                        • Mediation and Moderation analysis
                                        • Set Correlation
                                          • Converting output to APA style tables using LaTeX
                                          • Miscellaneous functions
                                          • Data sets
                                          • Development version and a users guide
                                          • Psychometric Theory
                                          • SessionInfo

                        gt png( outlierpng )

                        gt d2 lt- outlier(satactcex=8)

                        gt devoff()

                        null device

                        1

                        Figure 1 Using the outlier function to graphically show outliers The y axis is theMahalanobis D2 the X axis is the distribution of χ2 for the same number of degrees offreedom The outliers detected here may be shown graphically using pairspanels (see2 and may be found by sorting d2

                        12

                        3 - 5 that are less than 30 40 or 50 respectively or greater than 70 in any of the threecolumns will be replaced with NA In addition any value exactly equal to 45 will be setto NA (max and isvalue are set to one value here but they could be a different value forevery column)

                        gt x lt- matrix(1120ncol=10byrow=TRUE)

                        gt colnames(x) lt- paste(V110sep=)gt newx lt- scrub(x35min=c(304050)max=70isvalue=45newvalue=NA)

                        gt newx

                        V1 V2 V3 V4 V5 V6 V7 V8 V9 V10

                        [1] 1 2 NA NA NA 6 7 8 9 10

                        [2] 11 12 NA NA NA 16 17 18 19 20

                        [3] 21 22 NA NA NA 26 27 28 29 30

                        [4] 31 32 33 NA NA 36 37 38 39 40

                        [5] 41 42 43 44 NA 46 47 48 49 50

                        [6] 51 52 53 54 55 56 57 58 59 60

                        [7] 61 62 63 64 65 66 67 68 69 70

                        [8] 71 72 NA NA NA 76 77 78 79 80

                        [9] 81 82 NA NA NA 86 87 88 89 90

                        [10] 91 92 NA NA NA 96 97 98 99 100

                        [11] 101 102 NA NA NA 106 107 108 109 110

                        [12] 111 112 NA NA NA 116 117 118 119 120

                        Note that the number of subjects for those columns has decreased and the minimums havegone up but the maximums down Data cleaning and examination for outliers should be aroutine part of any data analysis

                        333 Recoding categorical variables into dummy coded variables

                        Sometimes categorical variables (eg college major occupation ethnicity) are to be ana-lyzed using correlation or regression To do this one can form ldquodummy codesrdquo which aremerely binary variables for each category This may be done using dummycode Subse-quent analyses using these dummy coded variables may be using biserial or point biserial(regular Pearson r) to show effect sizes and may be plotted in eg spider plots

                        Alternatively sometimes data were coded originally as categorical (MaleFemale HighSchool some College in college etc) and you want to convert these columns of data tonumeric This is done by char2numeric

                        34 Simple descriptive graphics

                        Graphic descriptions of data are very helpful both for understanding the data as well ascommunicating important results Scatter Plot Matrices (SPLOMS) using the pairspanelsfunction are useful ways to look for strange effects involving outliers and non-linearitieserrorbarsby will show group means with 95 confidence boundaries By default er-rorbarsby and errorbars will show ldquocats eyesrdquo to graphically show the confidence

                        13

                        limits (Figure 6) This may be turned off by specifying eyes=FALSE densityBy or vio-

                        linBy may be used to show the distribution of the data in ldquoviolinrdquo plots (Figure 5) (Theseare sometimes called ldquolava-lamprdquo plots)

                        341 Scatter Plot Matrices

                        Scatter Plot Matrices (SPLOMS) are very useful for describing the data The pairspanelsfunction adapted from the help menu for the pairs function produces xy scatter plots ofeach pair of variables below the diagonal shows the histogram of each variable on thediagonal and shows the lowess locally fit regression line as well An ellipse around themean with the axis length reflecting one standard deviation of the x and y variables is alsodrawn The x axis in each scatter plot represents the column variable the y axis the rowvariable (Figure 2) When plotting many subjects it is both faster and cleaner to set theplot character (pch) to be rsquorsquo (See Figure 2 for an example)

                        pairspanels will show the pairwise scatter plots of all the variables as well as his-tograms locally smoothed regressions and the Pearson correlation When plottingmany data points (as in the case of the satact data it is possible to specify that theplot character is a period to get a somewhat cleaner graphic However in this figureto show the outliers we use colors and a larger plot character If we want to indicatersquosignificancersquo of the correlations by the conventional use of rsquomagic astricksrsquo we can setthe stars=TRUE option

                        Another example of pairspanels is to show differences between experimental groupsConsider the data in the affect data set The scores reflect post test scores on positiveand negative affect and energetic and tense arousal The colors show the results for fourmovie conditions depressing frightening movie neutral and a comedy

                        Yet another demonstration of pairspanels is useful when you have many subjects andwant to show the density of the distributions To do this we will use the makekeys

                        and scoreItems functions (discussed in the second vignette) to create scales measuringEnergetic Arousal Tense Arousal Positive Affect and Negative Affect (see the msq helpfile) We then show a pairspanels scatter plot matrix where we smooth the data pointsand show the density of the distribution by color

                        342 Density or violin plots

                        Graphical presentation of data may be shown using box plots to show the median and 25thand 75th percentiles A powerful alternative is to show the density distribution using theviolinBy function (Figure 5)

                        14

                        gt png( pairspanelspng )

                        gt satd2 lt- dataframe(satactd2) combine the d2 statistics from before with the satact dataframe

                        gt pairspanels(satd2bg=c(yellowblue)[(d2 gt 25)+1]pch=21stars=TRUE)

                        gt devoff()

                        null device

                        1

                        Figure 2 Using the pairspanels function to graphically show relationships The x axisin each scatter plot represents the column variable the y axis the row variable Note theextreme outlier for the ACT If the plot character were set to a period (pch=rsquorsquo) it wouldmake a cleaner graphic but in to show the outliers in color we use the plot characters 21and 22

                        15

                        gt png(affectpng)gt pairspanels(affect[1417]bg=c(redblackwhiteblue)[affect$Film]pch=21

                        + main=Affect varies by movies )

                        gt devoff()

                        null device

                        1

                        Figure 3 Using the pairspanels function to graphically show relationships The x axis ineach scatter plot represents the column variable the y axis the row variable The coloringrepresent four different movie conditions

                        16

                        gt keys lt- makekeys(msq[175]list(

                        + EA = c(active energetic vigorous wakeful wideawake fullofpep

                        + lively -sleepy -tired -drowsy)

                        + TA =c(intense jittery fearful tense clutchedup -quiet -still

                        + -placid -calm -atrest)

                        + PA =c(active excited strong inspired determined attentive

                        + interested enthusiastic proud alert)

                        + NAf =c(jittery nervous scared afraid guilty ashamed distressed

                        + upset hostile irritable )) )

                        gt scores lt- scoreItems(keysmsq[175])

                        gt png(msqpng)gt pairspanels(scores$scoressmoother=TRUE

                        + main =Density distributions of four measures of affect )

                        gt devoff()

                        null device

                        1

                        Figure 4 Using the pairspanels function to graphically show relationships The x axis ineach scatter plot represents the column variable the y axis the row variable The variablesare four measures of motivational state for 3896 participants Each scale is the averagescore of 10 items measuring motivational state Compare this a plot with smoother set toFALSE

                        17

                        gt data(satact)

                        gt violinBy(satact[56]satact$gendergrpname=c(M F)main=Density Plot by gender for SAT V and Q)

                        Density Plot by gender for SAT V and Q

                        Obs

                        erve

                        d

                        SATV M SATV F SATQ M SATQ F

                        200

                        300

                        400

                        500

                        600

                        700

                        800

                        Figure 5 Using the violinBy function to show the distribution of SAT V and Q for malesand females The plot shows the medians and 25th and 75th percentiles as well as theentire range and the density distribution

                        18

                        343 Means and error bars

                        Additional descriptive graphics include the ability to draw error bars on sets of data aswell as to draw error bars in both the x and y directions for paired data These are thefunctions errorbars errorbarsby errorbarstab and errorcrosses

                        errorbars show the 95 confidence intervals for each variable in a data frame or ma-trix These errors are based upon normal theory and the standard errors of the meanAlternative options include +- one standard deviation or 1 standard error If thedata are repeated measures the error bars will be reflect the between variable cor-relations By default the confidence intervals are displayed using a ldquocats eyesrdquo plotwhich emphasizes the distribution of confidence within the confidence interval

                        errorbarsby does the same but grouping the data by some condition

                        errorbarstab draws bar graphs from tabular data with error bars based upon thestandard error of proportion (σp =

                        radicpqN)

                        errorcrosses draw the confidence intervals for an x set and a y set of the same size

                        The use of the errorbarsby function allows for graphic comparisons of different groups(see Figure 6) Five personality measures are shown as a function of high versus low scoreson a ldquolierdquo scale People with higher lie scores tend to report being more agreeable consci-entious and less neurotic than people with lower lie scores The error bars are based uponnormal theory and thus are symmetric rather than reflect any skewing in the data

                        Although not recommended it is possible to use the errorbars function to draw bargraphs with associated error bars (This kind of dynamite plot (Figure 8) can be verymisleading in that the scale is arbitrary Go to a discussion of the problems in presentingdata this way at httpemdbolkerwikidotcomblogdynamite In the example shownnote that the graph starts at 0 although is out of the range This is a function of usingbars which always are assumed to start at zero Consider other ways of showing yourdata

                        344 Error bars for tabular data

                        However it is sometimes useful to show error bars for tabular data either found by thetable function or just directly input These may be found using the errorbarstab

                        function

                        19

                        gt data(epibfi)

                        gt errorbarsby(epibfi[610]epibfi$epilielt4)

                        095 confidence limits

                        Independent Variable

                        Dep

                        ende

                        nt V

                        aria

                        ble

                        bfagree bfcon bfext bfneur bfopen

                        050

                        100

                        150

                        Figure 6 Using the errorbarsby function shows that self reported personality scales onthe Big Five Inventory vary as a function of the Lie scale on the EPI The ldquocats eyesrdquo showthe distribution of the confidence

                        20

                        gt errorbarsby(satact[56]satact$genderbars=TRUE

                        + labels=c(MaleFemale)ylab=SAT scorexlab=)

                        Male Female

                        095 confidence limits

                        SAT

                        sco

                        re

                        200

                        300

                        400

                        500

                        600

                        700

                        800

                        200

                        300

                        400

                        500

                        600

                        700

                        800

                        Figure 7 A ldquoDynamite plotrdquo of SAT scores as a function of gender is one way of misleadingthe reader By using a bar graph the range of scores is ignored Bar graphs start from 0

                        21

                        gt T lt- with(satacttable(gendereducation))

                        gt rownames(T) lt- c(MF)

                        gt errorbarstab(Tway=bothylab=Proportion of Education Levelxlab=Level of Education

                        + main=Proportion of sample by education level)

                        Proportion of sample by education level

                        Level of Education

                        Pro

                        port

                        ion

                        of E

                        duca

                        tion

                        Leve

                        l

                        000

                        005

                        010

                        015

                        020

                        025

                        030

                        M 0 M 1 M 2 M 3 M 4 M 5

                        000

                        005

                        010

                        015

                        020

                        025

                        030

                        Figure 8 The proportion of each education level that is Male or Female By using theway=rdquobothrdquo option the percentages and errors are based upon the grand total Alterna-tively way=rdquocolumnsrdquo finds column wise percentages way=rdquorowsrdquo finds rowwise percent-ages The data can be converted to percentages (as shown) or by total count (raw=TRUE)The function invisibly returns the probabilities and standard errors See the help menu foran example of entering the data as a dataframe

                        22

                        345 Two dimensional displays of means and errors

                        Yet another way to display data for different conditions is to use the errorCrosses func-tion For instance the effect of various movies on both ldquoEnergetic Arousalrdquo and ldquoTenseArousalrdquo can be seen in one graph and compared to the same movie manipulations onldquoPositive Affectrdquo and ldquoNegative Affectrdquo Note how Energetic Arousal is increased by threeof the movie manipulations but that Positive Affect increases following the Happy movieonly

                        23

                        gt op lt- par(mfrow=c(12))

                        gt data(affect)

                        gt colors lt- c(blackredwhiteblue)

                        gt films lt- c(SadHorrorNeutralHappy)

                        gt affectstats lt- errorCircles(EA2TA2data=affect[-c(120)]group=Filmlabels=films

                        + xlab=Energetic Arousal ylab=Tense Arousalylim=c(1022)xlim=c(820)pch=16

                        + cex=2colors=colors main = Movies effect on arousal)gt errorCircles(PA2NA2data=affectstatslabels=filmsxlab=Positive Affect

                        + ylab=Negative Affect pch=16cex=2colors=colors main =Movies effect on affect)

                        gt op lt- par(mfrow=c(11))

                        8 12 16 20

                        1012

                        1416

                        1820

                        22

                        Movies effect on arousal

                        Energetic Arousal

                        Tens

                        e A

                        rous

                        al

                        SadHorror

                        NeutralHappy

                        6 8 10 12

                        24

                        68

                        10

                        Movies effect on affect

                        Positive Affect

                        Neg

                        ativ

                        e A

                        ffect

                        Sad

                        Horror

                        NeutralHappy

                        Figure 9 The use of the errorCircles function allows for two dimensional displays ofmeans and error bars The first call to errorCircles finds descriptive statistics for theaffect dataframe based upon the grouping variable of Film These data are returned andthen used by the second call which examines the effect of the same grouping variable upondifferent measures The size of the circles represent the relative sample sizes for each groupThe data are from the PMC lab and reported in Smillie et al (2012)

                        24

                        346 Back to back histograms

                        The bibars function summarize the characteristics of two groups (eg males and females)on a second variable (eg age) by drawing back to back histograms (see Figure 10)

                        25

                        data(bfi)gt png( bibarspng )

                        gt with(bfibibars(agegenderylab=Agemain=Age by males and females))

                        gt devoff()

                        null device

                        1

                        Figure 10 A bar plot of the age distribution for males and females shows the use ofbibars The data are males and females from 2800 cases collected using the SAPAprocedure and are available as part of the bfi data set

                        26

                        347 Correlational structure

                        There are many ways to display correlations Tabular displays are probably the mostcommon The output from the cor function in core R is a rectangular matrix lowerMat

                        will round this to (2) digits and then display as a lower off diagonal matrix lowerCor

                        calls cor with use=lsquopairwisersquo method=lsquopearsonrsquo as default values and returns (invisibly)the full correlation matrix and displays the lower off diagonal matrix

                        gt lowerCor(satact)

                        gendr edctn age ACT SATV SATQ

                        gender 100

                        education 009 100

                        age -002 055 100

                        ACT -004 015 011 100

                        SATV -002 005 -004 056 100

                        SATQ -017 003 -003 059 064 100

                        When comparing results from two different groups it is convenient to display them as onematrix with the results from one group below the diagonal and the other group above thediagonal Use lowerUpper to do this

                        gt female lt- subset(satactsatact$gender==2)

                        gt male lt- subset(satactsatact$gender==1)

                        gt lower lt- lowerCor(male[-1])

                        edctn age ACT SATV SATQ

                        education 100

                        age 061 100

                        ACT 016 015 100

                        SATV 002 -006 061 100

                        SATQ 008 004 060 068 100

                        gt upper lt- lowerCor(female[-1])

                        edctn age ACT SATV SATQ

                        education 100

                        age 052 100

                        ACT 016 008 100

                        SATV 007 -003 053 100

                        SATQ 003 -009 058 063 100

                        gt both lt- lowerUpper(lowerupper)

                        gt round(both2)

                        education age ACT SATV SATQ

                        education NA 052 016 007 003

                        age 061 NA 008 -003 -009

                        ACT 016 015 NA 053 058

                        SATV 002 -006 061 NA 063

                        SATQ 008 004 060 068 NA

                        It is also possible to compare two matrices by taking their differences and displaying one (be-low the diagonal) and the difference of the second from the first above the diagonal

                        27

                        gt diffs lt- lowerUpper(lowerupperdiff=TRUE)

                        gt round(diffs2)

                        education age ACT SATV SATQ

                        education NA 009 000 -005 005

                        age 061 NA 007 -003 013

                        ACT 016 015 NA 008 002

                        SATV 002 -006 061 NA 005

                        SATQ 008 004 060 068 NA

                        348 Heatmap displays of correlational structure

                        Perhaps a better way to see the structure in a correlation matrix is to display a heat mapof the correlations This is just a matrix color coded to represent the magnitude of thecorrelation This is useful when considering the number of factors in a data set Considerthe Thurstone data set which has a clear 3 factor solution (Figure 11) or a simulated dataset of 24 variables with a circumplex structure (Figure 12) The color coding representsa ldquoheat maprdquo of the correlations with darker shades of red representing stronger negativeand darker shades of blue stronger positive correlations As an option the value of thecorrelation can be shown

                        Yet another way to show structure is to use ldquospiderrdquo plots Particularly if variables areordered in some meaningful way (eg in a circumplex) a spider plot will show this structureeasily This is just a plot of the magnitude of the correlation as a radial line with lengthranging from 0 (for a correlation of -1) to 1 (for a correlation of 1) (See Figure 13)

                        35 Testing correlations

                        Correlations are wonderful descriptive statistics of the data but some people like to testwhether these correlations differ from zero or differ from each other The cortest func-tion (in the stats package) will test the significance of a single correlation and the rcorr

                        function in the Hmisc package will do this for many correlations In the psych packagethe corrtest function reports the correlation (Pearson Spearman or Kendall) betweenall variables in either one or two data frames or matrices as well as the number of obser-vations for each case and the (two-tailed) probability for each correlation Unfortunatelythese probability values have not been corrected for multiple comparisons and so shouldbe taken with a great deal of salt Thus in corrtest and corrp the raw probabilitiesare reported below the diagonal and the probabilities adjusted for multiple comparisonsusing (by default) the Holm correction are reported above the diagonal (Table 1) (See thepadjust function for a discussion of Holm (1979) and other corrections)

                        Testing the difference between any two correlations can be done using the rtest functionThe function actually does four different tests (based upon an article by Steiger (1980)

                        28

                        gt png(corplotpng)gt corPlot(Thurstonenumbers=TRUEupper=FALSEdiag=FALSEmain=9 cognitive variables from Thurstone)

                        gt devoff()

                        null device

                        1

                        Figure 11 The structure of correlation matrix can be seen more clearly if the variables aregrouped by factor and then the correlations are shown by color By using the rsquonumbersrsquooption the values are displayed as well By default the complete matrix is shown Settingupper=FALSE and diag=FALSE shows a cleaner figure

                        29

                        gt png(circplotpng)gt circ lt- simcirc(24)

                        gt rcirc lt- cor(circ)

                        gt corPlot(rcircmain=24 variables in a circumplex)gt devoff()

                        null device

                        1

                        Figure 12 Using the corPlot function to show the correlations in a circumplex Correlationsare highest near the diagonal diminish to zero further from the diagonal and the increaseagain towards the corners of the matrix Circumplex structures are common in the studyof affect For circumplex structures it is perhaps useful to show the complete matrix

                        30

                        gt png(spiderpng)gt oplt- par(mfrow=c(22))

                        gt spider(y=c(161218)x=124data=rcircfill=TRUEmain=Spider plot of 24 circumplex variables)

                        gt op lt- par(mfrow=c(11))

                        gt devoff()

                        null device

                        1

                        Figure 13 A spider plot can show circumplex structure very clearly Circumplex structuresare common in the study of affect

                        31

                        Table 1 The corrtest function reports correlations cell sizes and raw and adjustedprobability values corrp reports the probability values for a correlation matrix Bydefault the adjustment used is that of Holm (1979)gt corrtest(satact)

                        Callcorrtest(x = satact)

                        Correlation matrix

                        gender education age ACT SATV SATQ

                        gender 100 009 -002 -004 -002 -017

                        education 009 100 055 015 005 003

                        age -002 055 100 011 -004 -003

                        ACT -004 015 011 100 056 059

                        SATV -002 005 -004 056 100 064

                        SATQ -017 003 -003 059 064 100

                        Sample Size

                        gender education age ACT SATV SATQ

                        gender 700 700 700 700 700 687

                        education 700 700 700 700 700 687

                        age 700 700 700 700 700 687

                        ACT 700 700 700 700 700 687

                        SATV 700 700 700 700 700 687

                        SATQ 687 687 687 687 687 687

                        Probability values (Entries above the diagonal are adjusted for multiple tests)

                        gender education age ACT SATV SATQ

                        gender 000 017 100 100 1 0

                        education 002 000 000 000 1 1

                        age 058 000 000 003 1 1

                        ACT 033 000 000 000 0 0

                        SATV 062 022 026 000 0 0

                        SATQ 000 036 037 000 0 0

                        To see confidence intervals of the correlations print with the short=FALSE option

                        32

                        depending upon the input

                        1) For a sample size n find the t and p value for a single correlation as well as the confidenceinterval

                        gt rtest(503)

                        Correlation tests

                        Callrtest(n = 50 r12 = 03)

                        Test of significance of a correlation

                        t value 218 with probability lt 0034

                        and confidence interval 002 053

                        2) For sample sizes of n and n2 (n2 = n if not specified) find the z of the difference betweenthe z transformed correlations divided by the standard error of the difference of two zscores

                        gt rtest(3046)

                        Correlation tests

                        Callrtest(n = 30 r12 = 04 r34 = 06)

                        Test of difference between two independent correlations

                        z value 099 with probability 032

                        3) For sample size n and correlations ra= r12 rb= r23 and r13 specified test for thedifference of two dependent correlations (Steiger case A)

                        gt rtest(103451)

                        Correlation tests

                        Call[1] rtest(n = 103 r12 = 04 r23 = 01 r13 = 05 )

                        Test of difference between two correlated correlations

                        t value -089 with probability lt 037

                        4) For sample size n test for the difference between two dependent correlations involvingdifferent variables (Steiger case B)

                        gt rtest(103567558) steiger Case B

                        Correlation tests

                        Callrtest(n = 103 r12 = 05 r34 = 06 r23 = 07 r13 = 05 r14 = 05

                        r24 = 08)

                        Test of difference between two dependent correlations

                        z value -12 with probability 023

                        To test whether a matrix of correlations differs from what would be expected if the popu-lation correlations were all zero the function cortest follows Steiger (1980) who pointedout that the sum of the squared elements of a correlation matrix or the Fisher z scoreequivalents is distributed as chi square under the null hypothesis that the values are zero(ie elements of the identity matrix) This is particularly useful for examining whethercorrelations in a single matrix differ from zero or for comparing two matrices Althoughobvious cortest can be used to test whether the satact data matrix produces non-zerocorrelations (it does) This is a much more appropriate test when testing whether a residualmatrix differs from zero

                        gt cortest(satact)

                        33

                        Tests of correlation matrices

                        Callcortest(R1 = satact)

                        Chi Square value 132542 with df = 15 with probability lt 18e-273

                        36 Polychoric tetrachoric polyserial and biserial correlations

                        The Pearson correlation of dichotomous data is also known as the φ coefficient If thedata eg ability items are thought to represent an underlying continuous although latentvariable the φ will underestimate the value of the Pearson applied to these latent variablesOne solution to this problem is to use the tetrachoric correlation which is based uponthe assumption of a bivariate normal distribution that has been cut at certain points Thedrawtetra function demonstrates the process (Figure 14) This is also shown in termsof dichotomizing the bivariate normal density function using the drawcor function (Fig-ure 15) A simple generalization of this to the case of the multiple cuts is the polychoric

                        correlation

                        Other estimated correlations based upon the assumption of bivariate normality with cutpoints include the biserial and polyserial correlation

                        If the data are a mix of continuous polytomous and dichotomous variables the mixedcor

                        function will calculate the appropriate mixture of Pearson polychoric tetrachoric biserialand polyserial correlations

                        The correlation matrix resulting from a number of tetrachoric or polychoric correlationmatrix sometimes will not be positive semi-definite This will sometimes happen if thecorrelation matrix is formed by using pair-wise deletion of cases The corsmooth functionwill adjust the smallest eigen values of the correlation matrix to make them positive rescaleall of them to sum to the number of variables and produce aldquosmoothedrdquocorrelation matrixAn example of this problem is a data set of burt which probably had a typo in the originalcorrelation matrix Smoothing the matrix corrects this problem

                        4 Multilevel modeling

                        Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular use

                        34

                        gt drawtetra()

                        minus3 minus2 minus1 0 1 2 3

                        minus3

                        minus2

                        minus1

                        01

                        23

                        Y rho = 05phi = 033

                        X gt τY gt Τ

                        X lt τY gt Τ

                        X gt τY lt Τ

                        X lt τY lt Τ

                        x

                        dnor

                        m(x

                        )

                        X gt τ

                        τ

                        x1

                        Y gt Τ

                        Τ

                        Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values

                        35

                        gt drawcor(expand=20cuts=c(00))

                        xy

                        z

                        Bivariate density rho = 05

                        Figure 15 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values It isfound (laboriously) by optimizing the fit of the bivariate normal for various values of thecorrelation to the observed cell frequencies

                        36

                        is the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups

                        41 Decomposing data into within and between level correlations usingstatsBy

                        There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler functionto give some of the basic descriptive statistics for two level models

                        This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package

                        rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)

                        where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means

                        42 Generating and displaying multilevel data

                        withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a betweengroup correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data

                        simmultilevel will generate simulated data with a multilevel structure

                        The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary

                        function specifying the variable of interest

                        37

                        Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )

                        43 Factor analysis by groups

                        Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output

                        sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)

                        faBy(sbnfactors=5) find the 5 factor solution for each education level

                        5 Multiple Regression mediation moderation and set cor-relations

                        The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices

                        51 Multiple regression from data or correlation matrices

                        The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstonecorrelation matrix and find the multiple correlation of the last five variables as a functionof the first 4

                        gt setCor(y = 59x=14data=Thurstone)

                        Call setCor(y = 59 x = 14 data = Thurstone)

                        Multiple Regression from matrix input

                        Beta weights

                        FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

                        Sentences 009 007 025 021 020

                        Vocabulary 009 017 009 016 -002

                        SentCompletion 002 005 004 021 008

                        FirstLetters 058 045 021 008 031

                        38

                        Multiple R

                        FourLetterWords Suffixes LetterSeries Pedigrees

                        069 063 050 058

                        LetterGroup

                        048

                        multiple R2

                        FourLetterWords Suffixes LetterSeries Pedigrees

                        048 040 025 034

                        LetterGroup

                        023

                        Multiple Inflation Factor (VIF) = 1(1-SMC) =

                        Sentences Vocabulary SentCompletion FirstLetters

                        369 388 300 135

                        Unweighted multiple R

                        FourLetterWords Suffixes LetterSeries Pedigrees

                        059 058 049 058

                        LetterGroup

                        045

                        Unweighted multiple R2

                        FourLetterWords Suffixes LetterSeries Pedigrees

                        034 034 024 033

                        LetterGroup

                        020

                        Various estimates of between set correlations

                        Squared Canonical Correlations

                        [1] 06280 01478 00076 00049

                        Average squared canonical correlation = 02

                        Cohens Set Correlation R2 = 069

                        Unweighted correlation between the two sets = 073

                        By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed

                        gt sc lt- setCor(y = 59x=34data=Thurstonez=12)

                        Call setCor(y = 59 x = 34 data = Thurstone z = 12)

                        Multiple Regression from matrix input

                        Beta weights

                        FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

                        SentCompletion 002 005 004 021 008

                        FirstLetters 058 045 021 008 031

                        Multiple R

                        FourLetterWords Suffixes LetterSeries Pedigrees

                        058 046 021 018

                        LetterGroup

                        030

                        39

                        multiple R2

                        FourLetterWords Suffixes LetterSeries Pedigrees

                        0331 0210 0043 0032

                        LetterGroup

                        0092

                        Multiple Inflation Factor (VIF) = 1(1-SMC) =

                        SentCompletion FirstLetters

                        102 102

                        Unweighted multiple R

                        FourLetterWords Suffixes LetterSeries Pedigrees

                        044 035 017 014

                        LetterGroup

                        026

                        Unweighted multiple R2

                        FourLetterWords Suffixes LetterSeries Pedigrees

                        019 012 003 002

                        LetterGroup

                        007

                        Various estimates of between set correlations

                        Squared Canonical Correlations

                        [1] 0405 0023

                        Average squared canonical correlation = 021

                        Cohens Set Correlation R2 = 042

                        Unweighted correlation between the two sets = 048

                        gt round(sc$residual2)

                        FourLetterWords Suffixes LetterSeries Pedigrees

                        FourLetterWords 052 011 009 006

                        Suffixes 011 060 -001 001

                        LetterSeries 009 -001 075 028

                        Pedigrees 006 001 028 066

                        LetterGroup 013 003 037 020

                        LetterGroup

                        FourLetterWords 013

                        Suffixes 003

                        LetterSeries 037

                        Pedigrees 020

                        LetterGroup 077

                        52 Mediation and Moderation analysis

                        Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping

                        40

                        Consider the example from Preacher and Hayes (2004) as analyzed using the mediate

                        function and the subsequent graphic from mediatediagram The data are found in theexample for mediate

                        Call mediate(y = SATIS x = THERAPY m = ATTRIB data = sobel)

                        The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB

                        Total Direct effect(c) of THERAPY on SATIS = 076 SE = 031 t direct = 25 with probability = 0019

                        Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t direct = 135 with probability = 019

                        Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033

                        Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 004 Upper CI = 069

                        R2 of model = 031

                        To see the longer output specify short = FALSE in the print statement

                        Full output

                        Total effect estimates (c)

                        SATIS se t Prob

                        THERAPY 076 031 25 00186

                        Direct effect estimates (c)SATIS se t Prob

                        THERAPY 043 032 135 0190

                        ATTRIB 040 018 223 0034

                        a effect estimates

                        THERAPY se t Prob

                        ATTRIB 082 03 274 00106

                        b effect estimates

                        SATIS se t Prob

                        ATTRIB 04 018 223 0034

                        ab effect estimates

                        SATIS boot sd lower upper

                        THERAPY 033 032 017 004 069

                        bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables

                        setCor(y = c( SATV SATQ) x = c(education age ) data = satact std=TRUE)

                        bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap

                        mediate(y = c( SATV ) x = c(education age ) m= ACT data =satactstd=TRUEniter=50)

                        bull mediate will take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It then tests the mediation effect using a bootstrap The particular example is for demonstration purposes only and shows neithermoderation nor mediation The number of iterations for the boot strap was set to 50

                        41

                        gt mediatediagram(preacher)

                        Mediation model

                        THERAPY SATIS

                        ATTRIB

                        082

                        c = 076

                        c = 043

                        04

                        Figure 16 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram

                        42

                        gt preacher lt- setCor(1c(23)sobelstd=FALSE)

                        gt setCordiagram(preacher)

                        Regression Models

                        THERAPY

                        ATTRIB

                        SATIS

                        043

                        04

                        021

                        Figure 17 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure

                        43

                        for speed The default number of boot straps is 5000

                        53 Set Correlation

                        An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor

                        function Set correlation is

                        R2 = 1minusn

                        prodi=1

                        (1minusλi)

                        where λi is the ith eigen value of the eigen value decomposition of the matrix

                        R = Rminus1xx RxyRminus1

                        xx Rminus1xy

                        Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate

                        setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data

                        Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things tonotice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied

                        For this example the analysis is done on the correlation matrix rather than the rawdata

                        gt C lt- cov(satactuse=pairwise)

                        gt model1 lt- lm(ACT~ gender + education + age data=satact)

                        gt summary(model1)

                        Call

                        lm(formula = ACT ~ gender + education + age data = satact)

                        Residuals

                        44

                        Call mediate(y = c(SATQ) x = c(ACT) m = education data = satact

                        mod = gender niter = 50 std = TRUE)

                        The DV (Y) was SATQ The IV (X) was ACT gender ACTXgndr The mediating variable(s) = education

                        Total Direct effect(c) of ACT on SATQ = 058 SE = 003 t direct = 1925 with probability = 0

                        Direct effect (c) of ACT on SATQ removing education = 059 SE = 003 t direct = 1926 with probability = 0

                        Indirect effect (ab) of ACT on SATQ through education = -001

                        Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -002 Upper CI = 0

                        Total Direct effect(c) of gender on SATQ = -014 SE = 003 t direct = -478 with probability = 21e-06

                        Direct effect (c) of gender on NA removing education = -014 SE = 003 t direct = -463 with probability = 44e-06

                        Indirect effect (ab) of gender on SATQ through education = 0

                        Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -001 Upper CI = 0

                        Total Direct effect(c) of ACTXgndr on SATQ = 0 SE = 003 t direct = 002 with probability = 099

                        Direct effect (c) of ACTXgndr on NA removing education = 0 SE = 003 t direct = 001 with probability = 099

                        Indirect effect (ab) of ACTXgndr on SATQ through education = 0

                        Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = 0 Upper CI = 0

                        R2 of model = 037

                        To see the longer output specify short = FALSE in the print statement

                        Full output

                        Total effect estimates (c)

                        SATQ se t Prob

                        ACT 058 003 1925 000e+00

                        gender -014 003 -478 210e-06

                        ACTXgndr 000 003 002 985e-01

                        Direct effect estimates (c)SATQ se t Prob

                        ACT 059 003 1926 000e+00

                        gender -014 003 -463 437e-06

                        ACTXgndr 000 003 001 992e-01

                        a effect estimates

                        education se t Prob

                        ACT 016 004 422 277e-05

                        gender 009 004 250 128e-02

                        ACTXgndr -001 004 -015 883e-01

                        b effect estimates

                        SATQ se t Prob

                        education -004 003 -145 0147

                        ab effect estimates

                        SATQ boot sd lower upper

                        ACT -001 -001 001 0 0

                        gender 000 000 000 0 0

                        ACTXgndr 000 000 000 0 0

                        Moderation model

                        ACT

                        gender

                        ACTXgndr

                        SATQ

                        education016 c = 058

                        c = 059

                        009 c = minus014

                        c = minus014

                        minus001 c = 0

                        c = 0

                        minus004

                        minus004

                        minus007

                        002

                        Figure 18 Moderated multiple regression requires the raw data

                        45

                        Min 1Q Median 3Q Max

                        -252458 -32133 07769 35921 92630

                        Coefficients

                        Estimate Std Error t value Pr(gt|t|)

                        (Intercept) 2741706 082140 33378 lt 2e-16

                        gender -048606 037984 -1280 020110

                        education 047890 015235 3143 000174

                        age 001623 002278 0712 047650

                        ---

                        Signif codes 0 0001 001 005 01 1

                        Residual standard error 4768 on 696 degrees of freedom

                        Multiple R-squared 00272 Adjusted R-squared 002301

                        F-statistic 6487 on 3 and 696 DF p-value 00002476

                        Compare this with the output from setCor

                        gt compare with sector

                        gt setCor(c(46)c(13)C nobs=700)

                        Call setCor(y = c(46) x = c(13) data = C nobs = 700)

                        Multiple Regression from matrix input

                        Beta weights

                        ACT SATV SATQ

                        gender -005 -003 -018

                        education 014 010 010

                        age 003 -010 -009

                        Multiple R

                        ACT SATV SATQ

                        016 010 019

                        multiple R2

                        ACT SATV SATQ

                        00272 00096 00359

                        Multiple Inflation Factor (VIF) = 1(1-SMC) =

                        gender education age

                        101 145 144

                        Unweighted multiple R

                        ACT SATV SATQ

                        015 005 011

                        Unweighted multiple R2

                        ACT SATV SATQ

                        002 000 001

                        SE of Beta weights

                        ACT SATV SATQ

                        gender 018 429 434

                        education 022 513 518

                        age 022 511 516

                        t of Beta Weights

                        ACT SATV SATQ

                        gender -027 -001 -004

                        education 065 002 002

                        46

                        age 015 -002 -002

                        Probability of t lt

                        ACT SATV SATQ

                        gender 079 099 097

                        education 051 098 098

                        age 088 098 099

                        Shrunken R2

                        ACT SATV SATQ

                        00230 00054 00317

                        Standard Error of R2

                        ACT SATV SATQ

                        00120 00073 00137

                        F

                        ACT SATV SATQ

                        649 226 863

                        Probability of F lt

                        ACT SATV SATQ

                        248e-04 808e-02 124e-05

                        degrees of freedom of regression

                        [1] 3 696

                        Various estimates of between set correlations

                        Squared Canonical Correlations

                        [1] 0050 0033 0008

                        Chisq of canonical correlations

                        [1] 358 231 56

                        Average squared canonical correlation = 003

                        Cohens Set Correlation R2 = 009

                        Shrunken Set Correlation R2 = 008

                        F and df of Cohens Set Correlation 726 9 168186

                        Unweighted correlation between the two sets = 001

                        Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

                        6 Converting output to APA style tables using LATEX

                        Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenient

                        47

                        LATEXoutput and finally df2latex converts a generic data frame to LATEX

                        An example of converting the output from fa to LATEXappears in Table 2

                        Table 2 fa2latexA factor analysis table from the psych package in R

                        Variable MR1 MR2 MR3 h2 u2 com

                        Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

                        SS loadings 264 186 15

                        MR1 100 059 054MR2 059 100 052MR3 054 052 100

                        48

                        7 Miscellaneous functions

                        A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

                        blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

                        df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

                        scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

                        cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

                        cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

                        dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

                        fisherz Convert a correlation to the corresponding Fisher z score

                        geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

                        ICC and cohenkappa are typically used to find the reliability for raters

                        headtail combines the head and tail functions to show the first and last lines of a dataset or output

                        topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

                        mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

                        prep finds the probability of replication for an F t or r and estimate effect size

                        partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

                        rangeCorrection will correct correlations for restriction of range

                        reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

                        49

                        superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

                        8 Data sets

                        A number of data sets for demonstrating psychometric techniques are included in thepsych package These include six data sets showing a hierarchical factor structure (fivecognitive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2and one from health psychology Reise) One of these (Thurstone) is used as an examplein the sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) or 13 personality inventory scores (epibfi) and14 multiple choice iq items (iqitems) The vegetables example has paired comparisonpreferences for 9 vegetables This is an example of Thurstonian scaling used by Guilford(1954) and Nunnally (1967) Other data sets include cubits peas and heights fromGalton

                        Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

                        bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

                        satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

                        epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

                        50

                        iq 14 multiple choice ability items were included as part of the Synthetic Aperture Person-ality Assessment (SAPA) web based personality assessment project The data from1000 subjects are included here as a demonstration set for scoring multiple choiceinventories and doing basic item statistics

                        galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to ntroduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

                        Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

                        miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

                        9 Development version and a users guide

                        The most recent development version is available as a source file at the repository main-tained at httppersonality-projectorgr That version will have removed the mostrecently discovered bugs (but perhaps introduced other yet to be discovered ones) Todownload that version go to the repository httppersonality-projectorgrsrc

                        contrib and wander around For a Mac this version can be installed directly using theldquoother repositoryrdquo option in the package installer For a PC the zip file for the most recentrelease has been created using the win-builder facility at CRAN The development releasefor the Mac is usually several weeks ahead of the PC development version

                        Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httppersonality-projectorgrpsych_manualpdf

                        News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

                        gt news(Version gt 170package=psych)

                        51

                        10 Psychometric Theory

                        The psych package has been developed to help psychologists do basic research Many ofthe functions were developed to supplement a book (httppersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep)More information about the use of some of the functions may be found in the book

                        For more extensive discussion of the use of psych in particular and R in general consulthttppersonality-projectorgrrguidehtml A short guide to R

                        11 SessionInfo

                        This document was prepared using the following settings

                        gt sessionInfo()

                        R Under development (unstable) (2017-03-05 r72309)

                        Platform x86_64-apple-darwin1340 (64-bit)

                        Running under macOS Sierra 10124

                        Matrix products default

                        BLAS LibraryFrameworksRframeworkVersions34ResourcesliblibRblas0dylib

                        LAPACK LibraryFrameworksRframeworkVersions34ResourcesliblibRlapackdylib

                        locale

                        [1] C

                        attached base packages

                        [1] stats graphics grDevices utils datasets methods base

                        other attached packages

                        [1] psych_17421

                        loaded via a namespace (and not attached)

                        [1] compiler_340 parallel_340 tools_340 foreign_08-67

                        [5] KernSmooth_223-15 nlme_31-131 mnormt_15-4 grid_340

                        [9] lattice_020-34

                        52

                        References

                        Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

                        Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

                        Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

                        Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

                        Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

                        Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

                        Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

                        Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

                        Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

                        Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

                        Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

                        Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

                        Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

                        Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

                        Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

                        53

                        Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

                        Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

                        Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

                        Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

                        Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

                        Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

                        Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

                        Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

                        Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

                        Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

                        MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

                        Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

                        McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

                        Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

                        Nunnally J C (1967) Psychometric theory McGraw-Hill New York

                        54

                        Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

                        3rd edition

                        Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

                        Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

                        Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

                        Revelle W (2015) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 158

                        Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

                        Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

                        Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

                        Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

                        Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook ofIndividual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

                        Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

                        Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

                        Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

                        Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bang

                        55

                        for the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

                        Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

                        Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

                        Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

                        Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

                        Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

                        Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

                        Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

                        Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

                        Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

                        Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

                        Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

                        56

                        Index

                        affect 14 24alpha 5 6

                        Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26bifactor 6biserial 13 34blockrandom 49burt 34

                        char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49circular statistics 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49ctv 7cubits 50

                        densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49diagram 8drawcor 34drawtetra 34dummycode 13

                        dynamite plot 19

                        edit 3epibfi 50error bars 19errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23

                        fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factor analysis 6factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49

                        galton 51generalized least squares 6geometricmean 49GPArotation 7guttman 6

                        harmonicmean 49head 49headtail 49heights 50hetdiagram 7Hmisc 28Holzinger 50

                        57

                        ICC 6 49iclust 6iclustdiagram 7Index 49introduction to psychometric theory with ap-

                        plications in R 7iqitems 50irtfa 6 47irt2latex 47 49

                        KnitR 47

                        lavaan 38library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27lowess 14

                        makekeys 14MAP 6mardia 49maximum likelihood 6mediate 4 41 42mediatediagram 41minimum residual 6mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevel 37multilevelreliability 6multiple regression 38

                        nfactors 6nlme 37

                        omega 6 7outlier 3 11 12

                        padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7principal axis 6psych 3 5ndash8 28 47 49ndash52

                        R functionaffect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50

                        58

                        densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13edit 3epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49head 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47

                        irt2latex 47 49library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7psych 51psych package

                        affect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26

                        59

                        biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49corsmooth 34cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7

                        fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47irt2latex 47 49lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12prep 49pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34

                        60

                        polyserial 34principal 5ndash7psych 51rtest 28rangeCorrection 49readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

                        rtest 28

                        rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50table 19tail 49tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

                        R package

                        61

                        ctv 7GPArotation 7Hmisc 28KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash8 28 47 49ndash52Rgraphviz 8sem 7 50stats 28Sweave 47xtable 47

                        rtest 28rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49Rgraphviz 8

                        SAPA 26 50 51satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43sem 7 50set correlation 44setcor 49setCor 4 38 41 44 46 47simmultilevel 37

                        spider 13stars 14stats 28StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50Sweave 47

                        table 19tail 49tetrachoric 6 34Thurstone 28 38 50Thurstone33 50topBottom 49

                        vegetables 50 51violinBy 14 18vss 5 6

                        weighted least squares 6withinBetween 37

                        xtable 47

                        62

                        • Jump starting the psych packagendasha guide for the impatient
                        • Psychometric functions are summarized in the second vignette
                        • Overview of this and related documents
                        • Getting started
                        • Basic data analysis
                          • Getting the data by using readfile
                          • Data input from the clipboard
                          • Basic descriptive statistics
                            • Outlier detection using outlier
                            • Basic data cleaning using scrub
                            • Recoding categorical variables into dummy coded variables
                              • Simple descriptive graphics
                                • Scatter Plot Matrices
                                • Density or violin plots
                                • Means and error bars
                                • Error bars for tabular data
                                • Two dimensional displays of means and errors
                                • Back to back histograms
                                • Correlational structure
                                • Heatmap displays of correlational structure
                                  • Testing correlations
                                  • Polychoric tetrachoric polyserial and biserial correlations
                                    • Multilevel modeling
                                      • Decomposing data into within and between level correlations using statsBy
                                      • Generating and displaying multilevel data
                                      • Factor analysis by groups
                                        • Multiple Regression mediation moderation and set correlations
                                          • Multiple regression from data or correlation matrices
                                          • Mediation and Moderation analysis
                                          • Set Correlation
                                            • Converting output to APA style tables using LaTeX
                                            • Miscellaneous functions
                                            • Data sets
                                            • Development version and a users guide
                                            • Psychometric Theory
                                            • SessionInfo

                          3 - 5 that are less than 30 40 or 50 respectively or greater than 70 in any of the threecolumns will be replaced with NA In addition any value exactly equal to 45 will be setto NA (max and isvalue are set to one value here but they could be a different value forevery column)

                          gt x lt- matrix(1120ncol=10byrow=TRUE)

                          gt colnames(x) lt- paste(V110sep=)gt newx lt- scrub(x35min=c(304050)max=70isvalue=45newvalue=NA)

                          gt newx

                          V1 V2 V3 V4 V5 V6 V7 V8 V9 V10

                          [1] 1 2 NA NA NA 6 7 8 9 10

                          [2] 11 12 NA NA NA 16 17 18 19 20

                          [3] 21 22 NA NA NA 26 27 28 29 30

                          [4] 31 32 33 NA NA 36 37 38 39 40

                          [5] 41 42 43 44 NA 46 47 48 49 50

                          [6] 51 52 53 54 55 56 57 58 59 60

                          [7] 61 62 63 64 65 66 67 68 69 70

                          [8] 71 72 NA NA NA 76 77 78 79 80

                          [9] 81 82 NA NA NA 86 87 88 89 90

                          [10] 91 92 NA NA NA 96 97 98 99 100

                          [11] 101 102 NA NA NA 106 107 108 109 110

                          [12] 111 112 NA NA NA 116 117 118 119 120

                          Note that the number of subjects for those columns has decreased and the minimums havegone up but the maximums down Data cleaning and examination for outliers should be aroutine part of any data analysis

                          333 Recoding categorical variables into dummy coded variables

                          Sometimes categorical variables (eg college major occupation ethnicity) are to be ana-lyzed using correlation or regression To do this one can form ldquodummy codesrdquo which aremerely binary variables for each category This may be done using dummycode Subse-quent analyses using these dummy coded variables may be using biserial or point biserial(regular Pearson r) to show effect sizes and may be plotted in eg spider plots

                          Alternatively sometimes data were coded originally as categorical (MaleFemale HighSchool some College in college etc) and you want to convert these columns of data tonumeric This is done by char2numeric

                          34 Simple descriptive graphics

                          Graphic descriptions of data are very helpful both for understanding the data as well ascommunicating important results Scatter Plot Matrices (SPLOMS) using the pairspanelsfunction are useful ways to look for strange effects involving outliers and non-linearitieserrorbarsby will show group means with 95 confidence boundaries By default er-rorbarsby and errorbars will show ldquocats eyesrdquo to graphically show the confidence

                          13

                          limits (Figure 6) This may be turned off by specifying eyes=FALSE densityBy or vio-

                          linBy may be used to show the distribution of the data in ldquoviolinrdquo plots (Figure 5) (Theseare sometimes called ldquolava-lamprdquo plots)

                          341 Scatter Plot Matrices

                          Scatter Plot Matrices (SPLOMS) are very useful for describing the data The pairspanelsfunction adapted from the help menu for the pairs function produces xy scatter plots ofeach pair of variables below the diagonal shows the histogram of each variable on thediagonal and shows the lowess locally fit regression line as well An ellipse around themean with the axis length reflecting one standard deviation of the x and y variables is alsodrawn The x axis in each scatter plot represents the column variable the y axis the rowvariable (Figure 2) When plotting many subjects it is both faster and cleaner to set theplot character (pch) to be rsquorsquo (See Figure 2 for an example)

                          pairspanels will show the pairwise scatter plots of all the variables as well as his-tograms locally smoothed regressions and the Pearson correlation When plottingmany data points (as in the case of the satact data it is possible to specify that theplot character is a period to get a somewhat cleaner graphic However in this figureto show the outliers we use colors and a larger plot character If we want to indicatersquosignificancersquo of the correlations by the conventional use of rsquomagic astricksrsquo we can setthe stars=TRUE option

                          Another example of pairspanels is to show differences between experimental groupsConsider the data in the affect data set The scores reflect post test scores on positiveand negative affect and energetic and tense arousal The colors show the results for fourmovie conditions depressing frightening movie neutral and a comedy

                          Yet another demonstration of pairspanels is useful when you have many subjects andwant to show the density of the distributions To do this we will use the makekeys

                          and scoreItems functions (discussed in the second vignette) to create scales measuringEnergetic Arousal Tense Arousal Positive Affect and Negative Affect (see the msq helpfile) We then show a pairspanels scatter plot matrix where we smooth the data pointsand show the density of the distribution by color

                          342 Density or violin plots

                          Graphical presentation of data may be shown using box plots to show the median and 25thand 75th percentiles A powerful alternative is to show the density distribution using theviolinBy function (Figure 5)

                          14

                          gt png( pairspanelspng )

                          gt satd2 lt- dataframe(satactd2) combine the d2 statistics from before with the satact dataframe

                          gt pairspanels(satd2bg=c(yellowblue)[(d2 gt 25)+1]pch=21stars=TRUE)

                          gt devoff()

                          null device

                          1

                          Figure 2 Using the pairspanels function to graphically show relationships The x axisin each scatter plot represents the column variable the y axis the row variable Note theextreme outlier for the ACT If the plot character were set to a period (pch=rsquorsquo) it wouldmake a cleaner graphic but in to show the outliers in color we use the plot characters 21and 22

                          15

                          gt png(affectpng)gt pairspanels(affect[1417]bg=c(redblackwhiteblue)[affect$Film]pch=21

                          + main=Affect varies by movies )

                          gt devoff()

                          null device

                          1

                          Figure 3 Using the pairspanels function to graphically show relationships The x axis ineach scatter plot represents the column variable the y axis the row variable The coloringrepresent four different movie conditions

                          16

                          gt keys lt- makekeys(msq[175]list(

                          + EA = c(active energetic vigorous wakeful wideawake fullofpep

                          + lively -sleepy -tired -drowsy)

                          + TA =c(intense jittery fearful tense clutchedup -quiet -still

                          + -placid -calm -atrest)

                          + PA =c(active excited strong inspired determined attentive

                          + interested enthusiastic proud alert)

                          + NAf =c(jittery nervous scared afraid guilty ashamed distressed

                          + upset hostile irritable )) )

                          gt scores lt- scoreItems(keysmsq[175])

                          gt png(msqpng)gt pairspanels(scores$scoressmoother=TRUE

                          + main =Density distributions of four measures of affect )

                          gt devoff()

                          null device

                          1

                          Figure 4 Using the pairspanels function to graphically show relationships The x axis ineach scatter plot represents the column variable the y axis the row variable The variablesare four measures of motivational state for 3896 participants Each scale is the averagescore of 10 items measuring motivational state Compare this a plot with smoother set toFALSE

                          17

                          gt data(satact)

                          gt violinBy(satact[56]satact$gendergrpname=c(M F)main=Density Plot by gender for SAT V and Q)

                          Density Plot by gender for SAT V and Q

                          Obs

                          erve

                          d

                          SATV M SATV F SATQ M SATQ F

                          200

                          300

                          400

                          500

                          600

                          700

                          800

                          Figure 5 Using the violinBy function to show the distribution of SAT V and Q for malesand females The plot shows the medians and 25th and 75th percentiles as well as theentire range and the density distribution

                          18

                          343 Means and error bars

                          Additional descriptive graphics include the ability to draw error bars on sets of data aswell as to draw error bars in both the x and y directions for paired data These are thefunctions errorbars errorbarsby errorbarstab and errorcrosses

                          errorbars show the 95 confidence intervals for each variable in a data frame or ma-trix These errors are based upon normal theory and the standard errors of the meanAlternative options include +- one standard deviation or 1 standard error If thedata are repeated measures the error bars will be reflect the between variable cor-relations By default the confidence intervals are displayed using a ldquocats eyesrdquo plotwhich emphasizes the distribution of confidence within the confidence interval

                          errorbarsby does the same but grouping the data by some condition

                          errorbarstab draws bar graphs from tabular data with error bars based upon thestandard error of proportion (σp =

                          radicpqN)

                          errorcrosses draw the confidence intervals for an x set and a y set of the same size

                          The use of the errorbarsby function allows for graphic comparisons of different groups(see Figure 6) Five personality measures are shown as a function of high versus low scoreson a ldquolierdquo scale People with higher lie scores tend to report being more agreeable consci-entious and less neurotic than people with lower lie scores The error bars are based uponnormal theory and thus are symmetric rather than reflect any skewing in the data

                          Although not recommended it is possible to use the errorbars function to draw bargraphs with associated error bars (This kind of dynamite plot (Figure 8) can be verymisleading in that the scale is arbitrary Go to a discussion of the problems in presentingdata this way at httpemdbolkerwikidotcomblogdynamite In the example shownnote that the graph starts at 0 although is out of the range This is a function of usingbars which always are assumed to start at zero Consider other ways of showing yourdata

                          344 Error bars for tabular data

                          However it is sometimes useful to show error bars for tabular data either found by thetable function or just directly input These may be found using the errorbarstab

                          function

                          19

                          gt data(epibfi)

                          gt errorbarsby(epibfi[610]epibfi$epilielt4)

                          095 confidence limits

                          Independent Variable

                          Dep

                          ende

                          nt V

                          aria

                          ble

                          bfagree bfcon bfext bfneur bfopen

                          050

                          100

                          150

                          Figure 6 Using the errorbarsby function shows that self reported personality scales onthe Big Five Inventory vary as a function of the Lie scale on the EPI The ldquocats eyesrdquo showthe distribution of the confidence

                          20

                          gt errorbarsby(satact[56]satact$genderbars=TRUE

                          + labels=c(MaleFemale)ylab=SAT scorexlab=)

                          Male Female

                          095 confidence limits

                          SAT

                          sco

                          re

                          200

                          300

                          400

                          500

                          600

                          700

                          800

                          200

                          300

                          400

                          500

                          600

                          700

                          800

                          Figure 7 A ldquoDynamite plotrdquo of SAT scores as a function of gender is one way of misleadingthe reader By using a bar graph the range of scores is ignored Bar graphs start from 0

                          21

                          gt T lt- with(satacttable(gendereducation))

                          gt rownames(T) lt- c(MF)

                          gt errorbarstab(Tway=bothylab=Proportion of Education Levelxlab=Level of Education

                          + main=Proportion of sample by education level)

                          Proportion of sample by education level

                          Level of Education

                          Pro

                          port

                          ion

                          of E

                          duca

                          tion

                          Leve

                          l

                          000

                          005

                          010

                          015

                          020

                          025

                          030

                          M 0 M 1 M 2 M 3 M 4 M 5

                          000

                          005

                          010

                          015

                          020

                          025

                          030

                          Figure 8 The proportion of each education level that is Male or Female By using theway=rdquobothrdquo option the percentages and errors are based upon the grand total Alterna-tively way=rdquocolumnsrdquo finds column wise percentages way=rdquorowsrdquo finds rowwise percent-ages The data can be converted to percentages (as shown) or by total count (raw=TRUE)The function invisibly returns the probabilities and standard errors See the help menu foran example of entering the data as a dataframe

                          22

                          345 Two dimensional displays of means and errors

                          Yet another way to display data for different conditions is to use the errorCrosses func-tion For instance the effect of various movies on both ldquoEnergetic Arousalrdquo and ldquoTenseArousalrdquo can be seen in one graph and compared to the same movie manipulations onldquoPositive Affectrdquo and ldquoNegative Affectrdquo Note how Energetic Arousal is increased by threeof the movie manipulations but that Positive Affect increases following the Happy movieonly

                          23

                          gt op lt- par(mfrow=c(12))

                          gt data(affect)

                          gt colors lt- c(blackredwhiteblue)

                          gt films lt- c(SadHorrorNeutralHappy)

                          gt affectstats lt- errorCircles(EA2TA2data=affect[-c(120)]group=Filmlabels=films

                          + xlab=Energetic Arousal ylab=Tense Arousalylim=c(1022)xlim=c(820)pch=16

                          + cex=2colors=colors main = Movies effect on arousal)gt errorCircles(PA2NA2data=affectstatslabels=filmsxlab=Positive Affect

                          + ylab=Negative Affect pch=16cex=2colors=colors main =Movies effect on affect)

                          gt op lt- par(mfrow=c(11))

                          8 12 16 20

                          1012

                          1416

                          1820

                          22

                          Movies effect on arousal

                          Energetic Arousal

                          Tens

                          e A

                          rous

                          al

                          SadHorror

                          NeutralHappy

                          6 8 10 12

                          24

                          68

                          10

                          Movies effect on affect

                          Positive Affect

                          Neg

                          ativ

                          e A

                          ffect

                          Sad

                          Horror

                          NeutralHappy

                          Figure 9 The use of the errorCircles function allows for two dimensional displays ofmeans and error bars The first call to errorCircles finds descriptive statistics for theaffect dataframe based upon the grouping variable of Film These data are returned andthen used by the second call which examines the effect of the same grouping variable upondifferent measures The size of the circles represent the relative sample sizes for each groupThe data are from the PMC lab and reported in Smillie et al (2012)

                          24

                          346 Back to back histograms

                          The bibars function summarize the characteristics of two groups (eg males and females)on a second variable (eg age) by drawing back to back histograms (see Figure 10)

                          25

                          data(bfi)gt png( bibarspng )

                          gt with(bfibibars(agegenderylab=Agemain=Age by males and females))

                          gt devoff()

                          null device

                          1

                          Figure 10 A bar plot of the age distribution for males and females shows the use ofbibars The data are males and females from 2800 cases collected using the SAPAprocedure and are available as part of the bfi data set

                          26

                          347 Correlational structure

                          There are many ways to display correlations Tabular displays are probably the mostcommon The output from the cor function in core R is a rectangular matrix lowerMat

                          will round this to (2) digits and then display as a lower off diagonal matrix lowerCor

                          calls cor with use=lsquopairwisersquo method=lsquopearsonrsquo as default values and returns (invisibly)the full correlation matrix and displays the lower off diagonal matrix

                          gt lowerCor(satact)

                          gendr edctn age ACT SATV SATQ

                          gender 100

                          education 009 100

                          age -002 055 100

                          ACT -004 015 011 100

                          SATV -002 005 -004 056 100

                          SATQ -017 003 -003 059 064 100

                          When comparing results from two different groups it is convenient to display them as onematrix with the results from one group below the diagonal and the other group above thediagonal Use lowerUpper to do this

                          gt female lt- subset(satactsatact$gender==2)

                          gt male lt- subset(satactsatact$gender==1)

                          gt lower lt- lowerCor(male[-1])

                          edctn age ACT SATV SATQ

                          education 100

                          age 061 100

                          ACT 016 015 100

                          SATV 002 -006 061 100

                          SATQ 008 004 060 068 100

                          gt upper lt- lowerCor(female[-1])

                          edctn age ACT SATV SATQ

                          education 100

                          age 052 100

                          ACT 016 008 100

                          SATV 007 -003 053 100

                          SATQ 003 -009 058 063 100

                          gt both lt- lowerUpper(lowerupper)

                          gt round(both2)

                          education age ACT SATV SATQ

                          education NA 052 016 007 003

                          age 061 NA 008 -003 -009

                          ACT 016 015 NA 053 058

                          SATV 002 -006 061 NA 063

                          SATQ 008 004 060 068 NA

                          It is also possible to compare two matrices by taking their differences and displaying one (be-low the diagonal) and the difference of the second from the first above the diagonal

                          27

                          gt diffs lt- lowerUpper(lowerupperdiff=TRUE)

                          gt round(diffs2)

                          education age ACT SATV SATQ

                          education NA 009 000 -005 005

                          age 061 NA 007 -003 013

                          ACT 016 015 NA 008 002

                          SATV 002 -006 061 NA 005

                          SATQ 008 004 060 068 NA

                          348 Heatmap displays of correlational structure

                          Perhaps a better way to see the structure in a correlation matrix is to display a heat mapof the correlations This is just a matrix color coded to represent the magnitude of thecorrelation This is useful when considering the number of factors in a data set Considerthe Thurstone data set which has a clear 3 factor solution (Figure 11) or a simulated dataset of 24 variables with a circumplex structure (Figure 12) The color coding representsa ldquoheat maprdquo of the correlations with darker shades of red representing stronger negativeand darker shades of blue stronger positive correlations As an option the value of thecorrelation can be shown

                          Yet another way to show structure is to use ldquospiderrdquo plots Particularly if variables areordered in some meaningful way (eg in a circumplex) a spider plot will show this structureeasily This is just a plot of the magnitude of the correlation as a radial line with lengthranging from 0 (for a correlation of -1) to 1 (for a correlation of 1) (See Figure 13)

                          35 Testing correlations

                          Correlations are wonderful descriptive statistics of the data but some people like to testwhether these correlations differ from zero or differ from each other The cortest func-tion (in the stats package) will test the significance of a single correlation and the rcorr

                          function in the Hmisc package will do this for many correlations In the psych packagethe corrtest function reports the correlation (Pearson Spearman or Kendall) betweenall variables in either one or two data frames or matrices as well as the number of obser-vations for each case and the (two-tailed) probability for each correlation Unfortunatelythese probability values have not been corrected for multiple comparisons and so shouldbe taken with a great deal of salt Thus in corrtest and corrp the raw probabilitiesare reported below the diagonal and the probabilities adjusted for multiple comparisonsusing (by default) the Holm correction are reported above the diagonal (Table 1) (See thepadjust function for a discussion of Holm (1979) and other corrections)

                          Testing the difference between any two correlations can be done using the rtest functionThe function actually does four different tests (based upon an article by Steiger (1980)

                          28

                          gt png(corplotpng)gt corPlot(Thurstonenumbers=TRUEupper=FALSEdiag=FALSEmain=9 cognitive variables from Thurstone)

                          gt devoff()

                          null device

                          1

                          Figure 11 The structure of correlation matrix can be seen more clearly if the variables aregrouped by factor and then the correlations are shown by color By using the rsquonumbersrsquooption the values are displayed as well By default the complete matrix is shown Settingupper=FALSE and diag=FALSE shows a cleaner figure

                          29

                          gt png(circplotpng)gt circ lt- simcirc(24)

                          gt rcirc lt- cor(circ)

                          gt corPlot(rcircmain=24 variables in a circumplex)gt devoff()

                          null device

                          1

                          Figure 12 Using the corPlot function to show the correlations in a circumplex Correlationsare highest near the diagonal diminish to zero further from the diagonal and the increaseagain towards the corners of the matrix Circumplex structures are common in the studyof affect For circumplex structures it is perhaps useful to show the complete matrix

                          30

                          gt png(spiderpng)gt oplt- par(mfrow=c(22))

                          gt spider(y=c(161218)x=124data=rcircfill=TRUEmain=Spider plot of 24 circumplex variables)

                          gt op lt- par(mfrow=c(11))

                          gt devoff()

                          null device

                          1

                          Figure 13 A spider plot can show circumplex structure very clearly Circumplex structuresare common in the study of affect

                          31

                          Table 1 The corrtest function reports correlations cell sizes and raw and adjustedprobability values corrp reports the probability values for a correlation matrix Bydefault the adjustment used is that of Holm (1979)gt corrtest(satact)

                          Callcorrtest(x = satact)

                          Correlation matrix

                          gender education age ACT SATV SATQ

                          gender 100 009 -002 -004 -002 -017

                          education 009 100 055 015 005 003

                          age -002 055 100 011 -004 -003

                          ACT -004 015 011 100 056 059

                          SATV -002 005 -004 056 100 064

                          SATQ -017 003 -003 059 064 100

                          Sample Size

                          gender education age ACT SATV SATQ

                          gender 700 700 700 700 700 687

                          education 700 700 700 700 700 687

                          age 700 700 700 700 700 687

                          ACT 700 700 700 700 700 687

                          SATV 700 700 700 700 700 687

                          SATQ 687 687 687 687 687 687

                          Probability values (Entries above the diagonal are adjusted for multiple tests)

                          gender education age ACT SATV SATQ

                          gender 000 017 100 100 1 0

                          education 002 000 000 000 1 1

                          age 058 000 000 003 1 1

                          ACT 033 000 000 000 0 0

                          SATV 062 022 026 000 0 0

                          SATQ 000 036 037 000 0 0

                          To see confidence intervals of the correlations print with the short=FALSE option

                          32

                          depending upon the input

                          1) For a sample size n find the t and p value for a single correlation as well as the confidenceinterval

                          gt rtest(503)

                          Correlation tests

                          Callrtest(n = 50 r12 = 03)

                          Test of significance of a correlation

                          t value 218 with probability lt 0034

                          and confidence interval 002 053

                          2) For sample sizes of n and n2 (n2 = n if not specified) find the z of the difference betweenthe z transformed correlations divided by the standard error of the difference of two zscores

                          gt rtest(3046)

                          Correlation tests

                          Callrtest(n = 30 r12 = 04 r34 = 06)

                          Test of difference between two independent correlations

                          z value 099 with probability 032

                          3) For sample size n and correlations ra= r12 rb= r23 and r13 specified test for thedifference of two dependent correlations (Steiger case A)

                          gt rtest(103451)

                          Correlation tests

                          Call[1] rtest(n = 103 r12 = 04 r23 = 01 r13 = 05 )

                          Test of difference between two correlated correlations

                          t value -089 with probability lt 037

                          4) For sample size n test for the difference between two dependent correlations involvingdifferent variables (Steiger case B)

                          gt rtest(103567558) steiger Case B

                          Correlation tests

                          Callrtest(n = 103 r12 = 05 r34 = 06 r23 = 07 r13 = 05 r14 = 05

                          r24 = 08)

                          Test of difference between two dependent correlations

                          z value -12 with probability 023

                          To test whether a matrix of correlations differs from what would be expected if the popu-lation correlations were all zero the function cortest follows Steiger (1980) who pointedout that the sum of the squared elements of a correlation matrix or the Fisher z scoreequivalents is distributed as chi square under the null hypothesis that the values are zero(ie elements of the identity matrix) This is particularly useful for examining whethercorrelations in a single matrix differ from zero or for comparing two matrices Althoughobvious cortest can be used to test whether the satact data matrix produces non-zerocorrelations (it does) This is a much more appropriate test when testing whether a residualmatrix differs from zero

                          gt cortest(satact)

                          33

                          Tests of correlation matrices

                          Callcortest(R1 = satact)

                          Chi Square value 132542 with df = 15 with probability lt 18e-273

                          36 Polychoric tetrachoric polyserial and biserial correlations

                          The Pearson correlation of dichotomous data is also known as the φ coefficient If thedata eg ability items are thought to represent an underlying continuous although latentvariable the φ will underestimate the value of the Pearson applied to these latent variablesOne solution to this problem is to use the tetrachoric correlation which is based uponthe assumption of a bivariate normal distribution that has been cut at certain points Thedrawtetra function demonstrates the process (Figure 14) This is also shown in termsof dichotomizing the bivariate normal density function using the drawcor function (Fig-ure 15) A simple generalization of this to the case of the multiple cuts is the polychoric

                          correlation

                          Other estimated correlations based upon the assumption of bivariate normality with cutpoints include the biserial and polyserial correlation

                          If the data are a mix of continuous polytomous and dichotomous variables the mixedcor

                          function will calculate the appropriate mixture of Pearson polychoric tetrachoric biserialand polyserial correlations

                          The correlation matrix resulting from a number of tetrachoric or polychoric correlationmatrix sometimes will not be positive semi-definite This will sometimes happen if thecorrelation matrix is formed by using pair-wise deletion of cases The corsmooth functionwill adjust the smallest eigen values of the correlation matrix to make them positive rescaleall of them to sum to the number of variables and produce aldquosmoothedrdquocorrelation matrixAn example of this problem is a data set of burt which probably had a typo in the originalcorrelation matrix Smoothing the matrix corrects this problem

                          4 Multilevel modeling

                          Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular use

                          34

                          gt drawtetra()

                          minus3 minus2 minus1 0 1 2 3

                          minus3

                          minus2

                          minus1

                          01

                          23

                          Y rho = 05phi = 033

                          X gt τY gt Τ

                          X lt τY gt Τ

                          X gt τY lt Τ

                          X lt τY lt Τ

                          x

                          dnor

                          m(x

                          )

                          X gt τ

                          τ

                          x1

                          Y gt Τ

                          Τ

                          Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values

                          35

                          gt drawcor(expand=20cuts=c(00))

                          xy

                          z

                          Bivariate density rho = 05

                          Figure 15 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values It isfound (laboriously) by optimizing the fit of the bivariate normal for various values of thecorrelation to the observed cell frequencies

                          36

                          is the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups

                          41 Decomposing data into within and between level correlations usingstatsBy

                          There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler functionto give some of the basic descriptive statistics for two level models

                          This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package

                          rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)

                          where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means

                          42 Generating and displaying multilevel data

                          withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a betweengroup correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data

                          simmultilevel will generate simulated data with a multilevel structure

                          The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary

                          function specifying the variable of interest

                          37

                          Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )

                          43 Factor analysis by groups

                          Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output

                          sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)

                          faBy(sbnfactors=5) find the 5 factor solution for each education level

                          5 Multiple Regression mediation moderation and set cor-relations

                          The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices

                          51 Multiple regression from data or correlation matrices

                          The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstonecorrelation matrix and find the multiple correlation of the last five variables as a functionof the first 4

                          gt setCor(y = 59x=14data=Thurstone)

                          Call setCor(y = 59 x = 14 data = Thurstone)

                          Multiple Regression from matrix input

                          Beta weights

                          FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

                          Sentences 009 007 025 021 020

                          Vocabulary 009 017 009 016 -002

                          SentCompletion 002 005 004 021 008

                          FirstLetters 058 045 021 008 031

                          38

                          Multiple R

                          FourLetterWords Suffixes LetterSeries Pedigrees

                          069 063 050 058

                          LetterGroup

                          048

                          multiple R2

                          FourLetterWords Suffixes LetterSeries Pedigrees

                          048 040 025 034

                          LetterGroup

                          023

                          Multiple Inflation Factor (VIF) = 1(1-SMC) =

                          Sentences Vocabulary SentCompletion FirstLetters

                          369 388 300 135

                          Unweighted multiple R

                          FourLetterWords Suffixes LetterSeries Pedigrees

                          059 058 049 058

                          LetterGroup

                          045

                          Unweighted multiple R2

                          FourLetterWords Suffixes LetterSeries Pedigrees

                          034 034 024 033

                          LetterGroup

                          020

                          Various estimates of between set correlations

                          Squared Canonical Correlations

                          [1] 06280 01478 00076 00049

                          Average squared canonical correlation = 02

                          Cohens Set Correlation R2 = 069

                          Unweighted correlation between the two sets = 073

                          By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed

                          gt sc lt- setCor(y = 59x=34data=Thurstonez=12)

                          Call setCor(y = 59 x = 34 data = Thurstone z = 12)

                          Multiple Regression from matrix input

                          Beta weights

                          FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

                          SentCompletion 002 005 004 021 008

                          FirstLetters 058 045 021 008 031

                          Multiple R

                          FourLetterWords Suffixes LetterSeries Pedigrees

                          058 046 021 018

                          LetterGroup

                          030

                          39

                          multiple R2

                          FourLetterWords Suffixes LetterSeries Pedigrees

                          0331 0210 0043 0032

                          LetterGroup

                          0092

                          Multiple Inflation Factor (VIF) = 1(1-SMC) =

                          SentCompletion FirstLetters

                          102 102

                          Unweighted multiple R

                          FourLetterWords Suffixes LetterSeries Pedigrees

                          044 035 017 014

                          LetterGroup

                          026

                          Unweighted multiple R2

                          FourLetterWords Suffixes LetterSeries Pedigrees

                          019 012 003 002

                          LetterGroup

                          007

                          Various estimates of between set correlations

                          Squared Canonical Correlations

                          [1] 0405 0023

                          Average squared canonical correlation = 021

                          Cohens Set Correlation R2 = 042

                          Unweighted correlation between the two sets = 048

                          gt round(sc$residual2)

                          FourLetterWords Suffixes LetterSeries Pedigrees

                          FourLetterWords 052 011 009 006

                          Suffixes 011 060 -001 001

                          LetterSeries 009 -001 075 028

                          Pedigrees 006 001 028 066

                          LetterGroup 013 003 037 020

                          LetterGroup

                          FourLetterWords 013

                          Suffixes 003

                          LetterSeries 037

                          Pedigrees 020

                          LetterGroup 077

                          52 Mediation and Moderation analysis

                          Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping

                          40

                          Consider the example from Preacher and Hayes (2004) as analyzed using the mediate

                          function and the subsequent graphic from mediatediagram The data are found in theexample for mediate

                          Call mediate(y = SATIS x = THERAPY m = ATTRIB data = sobel)

                          The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB

                          Total Direct effect(c) of THERAPY on SATIS = 076 SE = 031 t direct = 25 with probability = 0019

                          Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t direct = 135 with probability = 019

                          Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033

                          Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 004 Upper CI = 069

                          R2 of model = 031

                          To see the longer output specify short = FALSE in the print statement

                          Full output

                          Total effect estimates (c)

                          SATIS se t Prob

                          THERAPY 076 031 25 00186

                          Direct effect estimates (c)SATIS se t Prob

                          THERAPY 043 032 135 0190

                          ATTRIB 040 018 223 0034

                          a effect estimates

                          THERAPY se t Prob

                          ATTRIB 082 03 274 00106

                          b effect estimates

                          SATIS se t Prob

                          ATTRIB 04 018 223 0034

                          ab effect estimates

                          SATIS boot sd lower upper

                          THERAPY 033 032 017 004 069

                          bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables

                          setCor(y = c( SATV SATQ) x = c(education age ) data = satact std=TRUE)

                          bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap

                          mediate(y = c( SATV ) x = c(education age ) m= ACT data =satactstd=TRUEniter=50)

                          bull mediate will take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It then tests the mediation effect using a bootstrap The particular example is for demonstration purposes only and shows neithermoderation nor mediation The number of iterations for the boot strap was set to 50

                          41

                          gt mediatediagram(preacher)

                          Mediation model

                          THERAPY SATIS

                          ATTRIB

                          082

                          c = 076

                          c = 043

                          04

                          Figure 16 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram

                          42

                          gt preacher lt- setCor(1c(23)sobelstd=FALSE)

                          gt setCordiagram(preacher)

                          Regression Models

                          THERAPY

                          ATTRIB

                          SATIS

                          043

                          04

                          021

                          Figure 17 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure

                          43

                          for speed The default number of boot straps is 5000

                          53 Set Correlation

                          An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor

                          function Set correlation is

                          R2 = 1minusn

                          prodi=1

                          (1minusλi)

                          where λi is the ith eigen value of the eigen value decomposition of the matrix

                          R = Rminus1xx RxyRminus1

                          xx Rminus1xy

                          Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate

                          setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data

                          Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things tonotice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied

                          For this example the analysis is done on the correlation matrix rather than the rawdata

                          gt C lt- cov(satactuse=pairwise)

                          gt model1 lt- lm(ACT~ gender + education + age data=satact)

                          gt summary(model1)

                          Call

                          lm(formula = ACT ~ gender + education + age data = satact)

                          Residuals

                          44

                          Call mediate(y = c(SATQ) x = c(ACT) m = education data = satact

                          mod = gender niter = 50 std = TRUE)

                          The DV (Y) was SATQ The IV (X) was ACT gender ACTXgndr The mediating variable(s) = education

                          Total Direct effect(c) of ACT on SATQ = 058 SE = 003 t direct = 1925 with probability = 0

                          Direct effect (c) of ACT on SATQ removing education = 059 SE = 003 t direct = 1926 with probability = 0

                          Indirect effect (ab) of ACT on SATQ through education = -001

                          Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -002 Upper CI = 0

                          Total Direct effect(c) of gender on SATQ = -014 SE = 003 t direct = -478 with probability = 21e-06

                          Direct effect (c) of gender on NA removing education = -014 SE = 003 t direct = -463 with probability = 44e-06

                          Indirect effect (ab) of gender on SATQ through education = 0

                          Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -001 Upper CI = 0

                          Total Direct effect(c) of ACTXgndr on SATQ = 0 SE = 003 t direct = 002 with probability = 099

                          Direct effect (c) of ACTXgndr on NA removing education = 0 SE = 003 t direct = 001 with probability = 099

                          Indirect effect (ab) of ACTXgndr on SATQ through education = 0

                          Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = 0 Upper CI = 0

                          R2 of model = 037

                          To see the longer output specify short = FALSE in the print statement

                          Full output

                          Total effect estimates (c)

                          SATQ se t Prob

                          ACT 058 003 1925 000e+00

                          gender -014 003 -478 210e-06

                          ACTXgndr 000 003 002 985e-01

                          Direct effect estimates (c)SATQ se t Prob

                          ACT 059 003 1926 000e+00

                          gender -014 003 -463 437e-06

                          ACTXgndr 000 003 001 992e-01

                          a effect estimates

                          education se t Prob

                          ACT 016 004 422 277e-05

                          gender 009 004 250 128e-02

                          ACTXgndr -001 004 -015 883e-01

                          b effect estimates

                          SATQ se t Prob

                          education -004 003 -145 0147

                          ab effect estimates

                          SATQ boot sd lower upper

                          ACT -001 -001 001 0 0

                          gender 000 000 000 0 0

                          ACTXgndr 000 000 000 0 0

                          Moderation model

                          ACT

                          gender

                          ACTXgndr

                          SATQ

                          education016 c = 058

                          c = 059

                          009 c = minus014

                          c = minus014

                          minus001 c = 0

                          c = 0

                          minus004

                          minus004

                          minus007

                          002

                          Figure 18 Moderated multiple regression requires the raw data

                          45

                          Min 1Q Median 3Q Max

                          -252458 -32133 07769 35921 92630

                          Coefficients

                          Estimate Std Error t value Pr(gt|t|)

                          (Intercept) 2741706 082140 33378 lt 2e-16

                          gender -048606 037984 -1280 020110

                          education 047890 015235 3143 000174

                          age 001623 002278 0712 047650

                          ---

                          Signif codes 0 0001 001 005 01 1

                          Residual standard error 4768 on 696 degrees of freedom

                          Multiple R-squared 00272 Adjusted R-squared 002301

                          F-statistic 6487 on 3 and 696 DF p-value 00002476

                          Compare this with the output from setCor

                          gt compare with sector

                          gt setCor(c(46)c(13)C nobs=700)

                          Call setCor(y = c(46) x = c(13) data = C nobs = 700)

                          Multiple Regression from matrix input

                          Beta weights

                          ACT SATV SATQ

                          gender -005 -003 -018

                          education 014 010 010

                          age 003 -010 -009

                          Multiple R

                          ACT SATV SATQ

                          016 010 019

                          multiple R2

                          ACT SATV SATQ

                          00272 00096 00359

                          Multiple Inflation Factor (VIF) = 1(1-SMC) =

                          gender education age

                          101 145 144

                          Unweighted multiple R

                          ACT SATV SATQ

                          015 005 011

                          Unweighted multiple R2

                          ACT SATV SATQ

                          002 000 001

                          SE of Beta weights

                          ACT SATV SATQ

                          gender 018 429 434

                          education 022 513 518

                          age 022 511 516

                          t of Beta Weights

                          ACT SATV SATQ

                          gender -027 -001 -004

                          education 065 002 002

                          46

                          age 015 -002 -002

                          Probability of t lt

                          ACT SATV SATQ

                          gender 079 099 097

                          education 051 098 098

                          age 088 098 099

                          Shrunken R2

                          ACT SATV SATQ

                          00230 00054 00317

                          Standard Error of R2

                          ACT SATV SATQ

                          00120 00073 00137

                          F

                          ACT SATV SATQ

                          649 226 863

                          Probability of F lt

                          ACT SATV SATQ

                          248e-04 808e-02 124e-05

                          degrees of freedom of regression

                          [1] 3 696

                          Various estimates of between set correlations

                          Squared Canonical Correlations

                          [1] 0050 0033 0008

                          Chisq of canonical correlations

                          [1] 358 231 56

                          Average squared canonical correlation = 003

                          Cohens Set Correlation R2 = 009

                          Shrunken Set Correlation R2 = 008

                          F and df of Cohens Set Correlation 726 9 168186

                          Unweighted correlation between the two sets = 001

                          Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

                          6 Converting output to APA style tables using LATEX

                          Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenient

                          47

                          LATEXoutput and finally df2latex converts a generic data frame to LATEX

                          An example of converting the output from fa to LATEXappears in Table 2

                          Table 2 fa2latexA factor analysis table from the psych package in R

                          Variable MR1 MR2 MR3 h2 u2 com

                          Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

                          SS loadings 264 186 15

                          MR1 100 059 054MR2 059 100 052MR3 054 052 100

                          48

                          7 Miscellaneous functions

                          A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

                          blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

                          df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

                          scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

                          cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

                          cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

                          dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

                          fisherz Convert a correlation to the corresponding Fisher z score

                          geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

                          ICC and cohenkappa are typically used to find the reliability for raters

                          headtail combines the head and tail functions to show the first and last lines of a dataset or output

                          topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

                          mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

                          prep finds the probability of replication for an F t or r and estimate effect size

                          partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

                          rangeCorrection will correct correlations for restriction of range

                          reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

                          49

                          superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

                          8 Data sets

                          A number of data sets for demonstrating psychometric techniques are included in thepsych package These include six data sets showing a hierarchical factor structure (fivecognitive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2and one from health psychology Reise) One of these (Thurstone) is used as an examplein the sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) or 13 personality inventory scores (epibfi) and14 multiple choice iq items (iqitems) The vegetables example has paired comparisonpreferences for 9 vegetables This is an example of Thurstonian scaling used by Guilford(1954) and Nunnally (1967) Other data sets include cubits peas and heights fromGalton

                          Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

                          bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

                          satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

                          epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

                          50

                          iq 14 multiple choice ability items were included as part of the Synthetic Aperture Person-ality Assessment (SAPA) web based personality assessment project The data from1000 subjects are included here as a demonstration set for scoring multiple choiceinventories and doing basic item statistics

                          galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to ntroduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

                          Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

                          miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

                          9 Development version and a users guide

                          The most recent development version is available as a source file at the repository main-tained at httppersonality-projectorgr That version will have removed the mostrecently discovered bugs (but perhaps introduced other yet to be discovered ones) Todownload that version go to the repository httppersonality-projectorgrsrc

                          contrib and wander around For a Mac this version can be installed directly using theldquoother repositoryrdquo option in the package installer For a PC the zip file for the most recentrelease has been created using the win-builder facility at CRAN The development releasefor the Mac is usually several weeks ahead of the PC development version

                          Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httppersonality-projectorgrpsych_manualpdf

                          News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

                          gt news(Version gt 170package=psych)

                          51

                          10 Psychometric Theory

                          The psych package has been developed to help psychologists do basic research Many ofthe functions were developed to supplement a book (httppersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep)More information about the use of some of the functions may be found in the book

                          For more extensive discussion of the use of psych in particular and R in general consulthttppersonality-projectorgrrguidehtml A short guide to R

                          11 SessionInfo

                          This document was prepared using the following settings

                          gt sessionInfo()

                          R Under development (unstable) (2017-03-05 r72309)

                          Platform x86_64-apple-darwin1340 (64-bit)

                          Running under macOS Sierra 10124

                          Matrix products default

                          BLAS LibraryFrameworksRframeworkVersions34ResourcesliblibRblas0dylib

                          LAPACK LibraryFrameworksRframeworkVersions34ResourcesliblibRlapackdylib

                          locale

                          [1] C

                          attached base packages

                          [1] stats graphics grDevices utils datasets methods base

                          other attached packages

                          [1] psych_17421

                          loaded via a namespace (and not attached)

                          [1] compiler_340 parallel_340 tools_340 foreign_08-67

                          [5] KernSmooth_223-15 nlme_31-131 mnormt_15-4 grid_340

                          [9] lattice_020-34

                          52

                          References

                          Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

                          Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

                          Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

                          Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

                          Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

                          Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

                          Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

                          Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

                          Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

                          Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

                          Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

                          Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

                          Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

                          Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

                          Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

                          53

                          Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

                          Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

                          Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

                          Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

                          Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

                          Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

                          Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

                          Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

                          Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

                          Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

                          MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

                          Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

                          McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

                          Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

                          Nunnally J C (1967) Psychometric theory McGraw-Hill New York

                          54

                          Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

                          3rd edition

                          Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

                          Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

                          Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

                          Revelle W (2015) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 158

                          Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

                          Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

                          Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

                          Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

                          Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook ofIndividual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

                          Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

                          Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

                          Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

                          Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bang

                          55

                          for the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

                          Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

                          Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

                          Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

                          Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

                          Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

                          Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

                          Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

                          Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

                          Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

                          Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

                          Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

                          56

                          Index

                          affect 14 24alpha 5 6

                          Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26bifactor 6biserial 13 34blockrandom 49burt 34

                          char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49circular statistics 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49ctv 7cubits 50

                          densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49diagram 8drawcor 34drawtetra 34dummycode 13

                          dynamite plot 19

                          edit 3epibfi 50error bars 19errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23

                          fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factor analysis 6factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49

                          galton 51generalized least squares 6geometricmean 49GPArotation 7guttman 6

                          harmonicmean 49head 49headtail 49heights 50hetdiagram 7Hmisc 28Holzinger 50

                          57

                          ICC 6 49iclust 6iclustdiagram 7Index 49introduction to psychometric theory with ap-

                          plications in R 7iqitems 50irtfa 6 47irt2latex 47 49

                          KnitR 47

                          lavaan 38library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27lowess 14

                          makekeys 14MAP 6mardia 49maximum likelihood 6mediate 4 41 42mediatediagram 41minimum residual 6mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevel 37multilevelreliability 6multiple regression 38

                          nfactors 6nlme 37

                          omega 6 7outlier 3 11 12

                          padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7principal axis 6psych 3 5ndash8 28 47 49ndash52

                          R functionaffect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50

                          58

                          densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13edit 3epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49head 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47

                          irt2latex 47 49library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7psych 51psych package

                          affect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26

                          59

                          biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49corsmooth 34cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7

                          fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47irt2latex 47 49lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12prep 49pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34

                          60

                          polyserial 34principal 5ndash7psych 51rtest 28rangeCorrection 49readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

                          rtest 28

                          rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50table 19tail 49tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

                          R package

                          61

                          ctv 7GPArotation 7Hmisc 28KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash8 28 47 49ndash52Rgraphviz 8sem 7 50stats 28Sweave 47xtable 47

                          rtest 28rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49Rgraphviz 8

                          SAPA 26 50 51satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43sem 7 50set correlation 44setcor 49setCor 4 38 41 44 46 47simmultilevel 37

                          spider 13stars 14stats 28StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50Sweave 47

                          table 19tail 49tetrachoric 6 34Thurstone 28 38 50Thurstone33 50topBottom 49

                          vegetables 50 51violinBy 14 18vss 5 6

                          weighted least squares 6withinBetween 37

                          xtable 47

                          62

                          • Jump starting the psych packagendasha guide for the impatient
                          • Psychometric functions are summarized in the second vignette
                          • Overview of this and related documents
                          • Getting started
                          • Basic data analysis
                            • Getting the data by using readfile
                            • Data input from the clipboard
                            • Basic descriptive statistics
                              • Outlier detection using outlier
                              • Basic data cleaning using scrub
                              • Recoding categorical variables into dummy coded variables
                                • Simple descriptive graphics
                                  • Scatter Plot Matrices
                                  • Density or violin plots
                                  • Means and error bars
                                  • Error bars for tabular data
                                  • Two dimensional displays of means and errors
                                  • Back to back histograms
                                  • Correlational structure
                                  • Heatmap displays of correlational structure
                                    • Testing correlations
                                    • Polychoric tetrachoric polyserial and biserial correlations
                                      • Multilevel modeling
                                        • Decomposing data into within and between level correlations using statsBy
                                        • Generating and displaying multilevel data
                                        • Factor analysis by groups
                                          • Multiple Regression mediation moderation and set correlations
                                            • Multiple regression from data or correlation matrices
                                            • Mediation and Moderation analysis
                                            • Set Correlation
                                              • Converting output to APA style tables using LaTeX
                                              • Miscellaneous functions
                                              • Data sets
                                              • Development version and a users guide
                                              • Psychometric Theory
                                              • SessionInfo

                            limits (Figure 6) This may be turned off by specifying eyes=FALSE densityBy or vio-

                            linBy may be used to show the distribution of the data in ldquoviolinrdquo plots (Figure 5) (Theseare sometimes called ldquolava-lamprdquo plots)

                            341 Scatter Plot Matrices

                            Scatter Plot Matrices (SPLOMS) are very useful for describing the data The pairspanelsfunction adapted from the help menu for the pairs function produces xy scatter plots ofeach pair of variables below the diagonal shows the histogram of each variable on thediagonal and shows the lowess locally fit regression line as well An ellipse around themean with the axis length reflecting one standard deviation of the x and y variables is alsodrawn The x axis in each scatter plot represents the column variable the y axis the rowvariable (Figure 2) When plotting many subjects it is both faster and cleaner to set theplot character (pch) to be rsquorsquo (See Figure 2 for an example)

                            pairspanels will show the pairwise scatter plots of all the variables as well as his-tograms locally smoothed regressions and the Pearson correlation When plottingmany data points (as in the case of the satact data it is possible to specify that theplot character is a period to get a somewhat cleaner graphic However in this figureto show the outliers we use colors and a larger plot character If we want to indicatersquosignificancersquo of the correlations by the conventional use of rsquomagic astricksrsquo we can setthe stars=TRUE option

                            Another example of pairspanels is to show differences between experimental groupsConsider the data in the affect data set The scores reflect post test scores on positiveand negative affect and energetic and tense arousal The colors show the results for fourmovie conditions depressing frightening movie neutral and a comedy

                            Yet another demonstration of pairspanels is useful when you have many subjects andwant to show the density of the distributions To do this we will use the makekeys

                            and scoreItems functions (discussed in the second vignette) to create scales measuringEnergetic Arousal Tense Arousal Positive Affect and Negative Affect (see the msq helpfile) We then show a pairspanels scatter plot matrix where we smooth the data pointsand show the density of the distribution by color

                            342 Density or violin plots

                            Graphical presentation of data may be shown using box plots to show the median and 25thand 75th percentiles A powerful alternative is to show the density distribution using theviolinBy function (Figure 5)

                            14

                            gt png( pairspanelspng )

                            gt satd2 lt- dataframe(satactd2) combine the d2 statistics from before with the satact dataframe

                            gt pairspanels(satd2bg=c(yellowblue)[(d2 gt 25)+1]pch=21stars=TRUE)

                            gt devoff()

                            null device

                            1

                            Figure 2 Using the pairspanels function to graphically show relationships The x axisin each scatter plot represents the column variable the y axis the row variable Note theextreme outlier for the ACT If the plot character were set to a period (pch=rsquorsquo) it wouldmake a cleaner graphic but in to show the outliers in color we use the plot characters 21and 22

                            15

                            gt png(affectpng)gt pairspanels(affect[1417]bg=c(redblackwhiteblue)[affect$Film]pch=21

                            + main=Affect varies by movies )

                            gt devoff()

                            null device

                            1

                            Figure 3 Using the pairspanels function to graphically show relationships The x axis ineach scatter plot represents the column variable the y axis the row variable The coloringrepresent four different movie conditions

                            16

                            gt keys lt- makekeys(msq[175]list(

                            + EA = c(active energetic vigorous wakeful wideawake fullofpep

                            + lively -sleepy -tired -drowsy)

                            + TA =c(intense jittery fearful tense clutchedup -quiet -still

                            + -placid -calm -atrest)

                            + PA =c(active excited strong inspired determined attentive

                            + interested enthusiastic proud alert)

                            + NAf =c(jittery nervous scared afraid guilty ashamed distressed

                            + upset hostile irritable )) )

                            gt scores lt- scoreItems(keysmsq[175])

                            gt png(msqpng)gt pairspanels(scores$scoressmoother=TRUE

                            + main =Density distributions of four measures of affect )

                            gt devoff()

                            null device

                            1

                            Figure 4 Using the pairspanels function to graphically show relationships The x axis ineach scatter plot represents the column variable the y axis the row variable The variablesare four measures of motivational state for 3896 participants Each scale is the averagescore of 10 items measuring motivational state Compare this a plot with smoother set toFALSE

                            17

                            gt data(satact)

                            gt violinBy(satact[56]satact$gendergrpname=c(M F)main=Density Plot by gender for SAT V and Q)

                            Density Plot by gender for SAT V and Q

                            Obs

                            erve

                            d

                            SATV M SATV F SATQ M SATQ F

                            200

                            300

                            400

                            500

                            600

                            700

                            800

                            Figure 5 Using the violinBy function to show the distribution of SAT V and Q for malesand females The plot shows the medians and 25th and 75th percentiles as well as theentire range and the density distribution

                            18

                            343 Means and error bars

                            Additional descriptive graphics include the ability to draw error bars on sets of data aswell as to draw error bars in both the x and y directions for paired data These are thefunctions errorbars errorbarsby errorbarstab and errorcrosses

                            errorbars show the 95 confidence intervals for each variable in a data frame or ma-trix These errors are based upon normal theory and the standard errors of the meanAlternative options include +- one standard deviation or 1 standard error If thedata are repeated measures the error bars will be reflect the between variable cor-relations By default the confidence intervals are displayed using a ldquocats eyesrdquo plotwhich emphasizes the distribution of confidence within the confidence interval

                            errorbarsby does the same but grouping the data by some condition

                            errorbarstab draws bar graphs from tabular data with error bars based upon thestandard error of proportion (σp =

                            radicpqN)

                            errorcrosses draw the confidence intervals for an x set and a y set of the same size

                            The use of the errorbarsby function allows for graphic comparisons of different groups(see Figure 6) Five personality measures are shown as a function of high versus low scoreson a ldquolierdquo scale People with higher lie scores tend to report being more agreeable consci-entious and less neurotic than people with lower lie scores The error bars are based uponnormal theory and thus are symmetric rather than reflect any skewing in the data

                            Although not recommended it is possible to use the errorbars function to draw bargraphs with associated error bars (This kind of dynamite plot (Figure 8) can be verymisleading in that the scale is arbitrary Go to a discussion of the problems in presentingdata this way at httpemdbolkerwikidotcomblogdynamite In the example shownnote that the graph starts at 0 although is out of the range This is a function of usingbars which always are assumed to start at zero Consider other ways of showing yourdata

                            344 Error bars for tabular data

                            However it is sometimes useful to show error bars for tabular data either found by thetable function or just directly input These may be found using the errorbarstab

                            function

                            19

                            gt data(epibfi)

                            gt errorbarsby(epibfi[610]epibfi$epilielt4)

                            095 confidence limits

                            Independent Variable

                            Dep

                            ende

                            nt V

                            aria

                            ble

                            bfagree bfcon bfext bfneur bfopen

                            050

                            100

                            150

                            Figure 6 Using the errorbarsby function shows that self reported personality scales onthe Big Five Inventory vary as a function of the Lie scale on the EPI The ldquocats eyesrdquo showthe distribution of the confidence

                            20

                            gt errorbarsby(satact[56]satact$genderbars=TRUE

                            + labels=c(MaleFemale)ylab=SAT scorexlab=)

                            Male Female

                            095 confidence limits

                            SAT

                            sco

                            re

                            200

                            300

                            400

                            500

                            600

                            700

                            800

                            200

                            300

                            400

                            500

                            600

                            700

                            800

                            Figure 7 A ldquoDynamite plotrdquo of SAT scores as a function of gender is one way of misleadingthe reader By using a bar graph the range of scores is ignored Bar graphs start from 0

                            21

                            gt T lt- with(satacttable(gendereducation))

                            gt rownames(T) lt- c(MF)

                            gt errorbarstab(Tway=bothylab=Proportion of Education Levelxlab=Level of Education

                            + main=Proportion of sample by education level)

                            Proportion of sample by education level

                            Level of Education

                            Pro

                            port

                            ion

                            of E

                            duca

                            tion

                            Leve

                            l

                            000

                            005

                            010

                            015

                            020

                            025

                            030

                            M 0 M 1 M 2 M 3 M 4 M 5

                            000

                            005

                            010

                            015

                            020

                            025

                            030

                            Figure 8 The proportion of each education level that is Male or Female By using theway=rdquobothrdquo option the percentages and errors are based upon the grand total Alterna-tively way=rdquocolumnsrdquo finds column wise percentages way=rdquorowsrdquo finds rowwise percent-ages The data can be converted to percentages (as shown) or by total count (raw=TRUE)The function invisibly returns the probabilities and standard errors See the help menu foran example of entering the data as a dataframe

                            22

                            345 Two dimensional displays of means and errors

                            Yet another way to display data for different conditions is to use the errorCrosses func-tion For instance the effect of various movies on both ldquoEnergetic Arousalrdquo and ldquoTenseArousalrdquo can be seen in one graph and compared to the same movie manipulations onldquoPositive Affectrdquo and ldquoNegative Affectrdquo Note how Energetic Arousal is increased by threeof the movie manipulations but that Positive Affect increases following the Happy movieonly

                            23

                            gt op lt- par(mfrow=c(12))

                            gt data(affect)

                            gt colors lt- c(blackredwhiteblue)

                            gt films lt- c(SadHorrorNeutralHappy)

                            gt affectstats lt- errorCircles(EA2TA2data=affect[-c(120)]group=Filmlabels=films

                            + xlab=Energetic Arousal ylab=Tense Arousalylim=c(1022)xlim=c(820)pch=16

                            + cex=2colors=colors main = Movies effect on arousal)gt errorCircles(PA2NA2data=affectstatslabels=filmsxlab=Positive Affect

                            + ylab=Negative Affect pch=16cex=2colors=colors main =Movies effect on affect)

                            gt op lt- par(mfrow=c(11))

                            8 12 16 20

                            1012

                            1416

                            1820

                            22

                            Movies effect on arousal

                            Energetic Arousal

                            Tens

                            e A

                            rous

                            al

                            SadHorror

                            NeutralHappy

                            6 8 10 12

                            24

                            68

                            10

                            Movies effect on affect

                            Positive Affect

                            Neg

                            ativ

                            e A

                            ffect

                            Sad

                            Horror

                            NeutralHappy

                            Figure 9 The use of the errorCircles function allows for two dimensional displays ofmeans and error bars The first call to errorCircles finds descriptive statistics for theaffect dataframe based upon the grouping variable of Film These data are returned andthen used by the second call which examines the effect of the same grouping variable upondifferent measures The size of the circles represent the relative sample sizes for each groupThe data are from the PMC lab and reported in Smillie et al (2012)

                            24

                            346 Back to back histograms

                            The bibars function summarize the characteristics of two groups (eg males and females)on a second variable (eg age) by drawing back to back histograms (see Figure 10)

                            25

                            data(bfi)gt png( bibarspng )

                            gt with(bfibibars(agegenderylab=Agemain=Age by males and females))

                            gt devoff()

                            null device

                            1

                            Figure 10 A bar plot of the age distribution for males and females shows the use ofbibars The data are males and females from 2800 cases collected using the SAPAprocedure and are available as part of the bfi data set

                            26

                            347 Correlational structure

                            There are many ways to display correlations Tabular displays are probably the mostcommon The output from the cor function in core R is a rectangular matrix lowerMat

                            will round this to (2) digits and then display as a lower off diagonal matrix lowerCor

                            calls cor with use=lsquopairwisersquo method=lsquopearsonrsquo as default values and returns (invisibly)the full correlation matrix and displays the lower off diagonal matrix

                            gt lowerCor(satact)

                            gendr edctn age ACT SATV SATQ

                            gender 100

                            education 009 100

                            age -002 055 100

                            ACT -004 015 011 100

                            SATV -002 005 -004 056 100

                            SATQ -017 003 -003 059 064 100

                            When comparing results from two different groups it is convenient to display them as onematrix with the results from one group below the diagonal and the other group above thediagonal Use lowerUpper to do this

                            gt female lt- subset(satactsatact$gender==2)

                            gt male lt- subset(satactsatact$gender==1)

                            gt lower lt- lowerCor(male[-1])

                            edctn age ACT SATV SATQ

                            education 100

                            age 061 100

                            ACT 016 015 100

                            SATV 002 -006 061 100

                            SATQ 008 004 060 068 100

                            gt upper lt- lowerCor(female[-1])

                            edctn age ACT SATV SATQ

                            education 100

                            age 052 100

                            ACT 016 008 100

                            SATV 007 -003 053 100

                            SATQ 003 -009 058 063 100

                            gt both lt- lowerUpper(lowerupper)

                            gt round(both2)

                            education age ACT SATV SATQ

                            education NA 052 016 007 003

                            age 061 NA 008 -003 -009

                            ACT 016 015 NA 053 058

                            SATV 002 -006 061 NA 063

                            SATQ 008 004 060 068 NA

                            It is also possible to compare two matrices by taking their differences and displaying one (be-low the diagonal) and the difference of the second from the first above the diagonal

                            27

                            gt diffs lt- lowerUpper(lowerupperdiff=TRUE)

                            gt round(diffs2)

                            education age ACT SATV SATQ

                            education NA 009 000 -005 005

                            age 061 NA 007 -003 013

                            ACT 016 015 NA 008 002

                            SATV 002 -006 061 NA 005

                            SATQ 008 004 060 068 NA

                            348 Heatmap displays of correlational structure

                            Perhaps a better way to see the structure in a correlation matrix is to display a heat mapof the correlations This is just a matrix color coded to represent the magnitude of thecorrelation This is useful when considering the number of factors in a data set Considerthe Thurstone data set which has a clear 3 factor solution (Figure 11) or a simulated dataset of 24 variables with a circumplex structure (Figure 12) The color coding representsa ldquoheat maprdquo of the correlations with darker shades of red representing stronger negativeand darker shades of blue stronger positive correlations As an option the value of thecorrelation can be shown

                            Yet another way to show structure is to use ldquospiderrdquo plots Particularly if variables areordered in some meaningful way (eg in a circumplex) a spider plot will show this structureeasily This is just a plot of the magnitude of the correlation as a radial line with lengthranging from 0 (for a correlation of -1) to 1 (for a correlation of 1) (See Figure 13)

                            35 Testing correlations

                            Correlations are wonderful descriptive statistics of the data but some people like to testwhether these correlations differ from zero or differ from each other The cortest func-tion (in the stats package) will test the significance of a single correlation and the rcorr

                            function in the Hmisc package will do this for many correlations In the psych packagethe corrtest function reports the correlation (Pearson Spearman or Kendall) betweenall variables in either one or two data frames or matrices as well as the number of obser-vations for each case and the (two-tailed) probability for each correlation Unfortunatelythese probability values have not been corrected for multiple comparisons and so shouldbe taken with a great deal of salt Thus in corrtest and corrp the raw probabilitiesare reported below the diagonal and the probabilities adjusted for multiple comparisonsusing (by default) the Holm correction are reported above the diagonal (Table 1) (See thepadjust function for a discussion of Holm (1979) and other corrections)

                            Testing the difference between any two correlations can be done using the rtest functionThe function actually does four different tests (based upon an article by Steiger (1980)

                            28

                            gt png(corplotpng)gt corPlot(Thurstonenumbers=TRUEupper=FALSEdiag=FALSEmain=9 cognitive variables from Thurstone)

                            gt devoff()

                            null device

                            1

                            Figure 11 The structure of correlation matrix can be seen more clearly if the variables aregrouped by factor and then the correlations are shown by color By using the rsquonumbersrsquooption the values are displayed as well By default the complete matrix is shown Settingupper=FALSE and diag=FALSE shows a cleaner figure

                            29

                            gt png(circplotpng)gt circ lt- simcirc(24)

                            gt rcirc lt- cor(circ)

                            gt corPlot(rcircmain=24 variables in a circumplex)gt devoff()

                            null device

                            1

                            Figure 12 Using the corPlot function to show the correlations in a circumplex Correlationsare highest near the diagonal diminish to zero further from the diagonal and the increaseagain towards the corners of the matrix Circumplex structures are common in the studyof affect For circumplex structures it is perhaps useful to show the complete matrix

                            30

                            gt png(spiderpng)gt oplt- par(mfrow=c(22))

                            gt spider(y=c(161218)x=124data=rcircfill=TRUEmain=Spider plot of 24 circumplex variables)

                            gt op lt- par(mfrow=c(11))

                            gt devoff()

                            null device

                            1

                            Figure 13 A spider plot can show circumplex structure very clearly Circumplex structuresare common in the study of affect

                            31

                            Table 1 The corrtest function reports correlations cell sizes and raw and adjustedprobability values corrp reports the probability values for a correlation matrix Bydefault the adjustment used is that of Holm (1979)gt corrtest(satact)

                            Callcorrtest(x = satact)

                            Correlation matrix

                            gender education age ACT SATV SATQ

                            gender 100 009 -002 -004 -002 -017

                            education 009 100 055 015 005 003

                            age -002 055 100 011 -004 -003

                            ACT -004 015 011 100 056 059

                            SATV -002 005 -004 056 100 064

                            SATQ -017 003 -003 059 064 100

                            Sample Size

                            gender education age ACT SATV SATQ

                            gender 700 700 700 700 700 687

                            education 700 700 700 700 700 687

                            age 700 700 700 700 700 687

                            ACT 700 700 700 700 700 687

                            SATV 700 700 700 700 700 687

                            SATQ 687 687 687 687 687 687

                            Probability values (Entries above the diagonal are adjusted for multiple tests)

                            gender education age ACT SATV SATQ

                            gender 000 017 100 100 1 0

                            education 002 000 000 000 1 1

                            age 058 000 000 003 1 1

                            ACT 033 000 000 000 0 0

                            SATV 062 022 026 000 0 0

                            SATQ 000 036 037 000 0 0

                            To see confidence intervals of the correlations print with the short=FALSE option

                            32

                            depending upon the input

                            1) For a sample size n find the t and p value for a single correlation as well as the confidenceinterval

                            gt rtest(503)

                            Correlation tests

                            Callrtest(n = 50 r12 = 03)

                            Test of significance of a correlation

                            t value 218 with probability lt 0034

                            and confidence interval 002 053

                            2) For sample sizes of n and n2 (n2 = n if not specified) find the z of the difference betweenthe z transformed correlations divided by the standard error of the difference of two zscores

                            gt rtest(3046)

                            Correlation tests

                            Callrtest(n = 30 r12 = 04 r34 = 06)

                            Test of difference between two independent correlations

                            z value 099 with probability 032

                            3) For sample size n and correlations ra= r12 rb= r23 and r13 specified test for thedifference of two dependent correlations (Steiger case A)

                            gt rtest(103451)

                            Correlation tests

                            Call[1] rtest(n = 103 r12 = 04 r23 = 01 r13 = 05 )

                            Test of difference between two correlated correlations

                            t value -089 with probability lt 037

                            4) For sample size n test for the difference between two dependent correlations involvingdifferent variables (Steiger case B)

                            gt rtest(103567558) steiger Case B

                            Correlation tests

                            Callrtest(n = 103 r12 = 05 r34 = 06 r23 = 07 r13 = 05 r14 = 05

                            r24 = 08)

                            Test of difference between two dependent correlations

                            z value -12 with probability 023

                            To test whether a matrix of correlations differs from what would be expected if the popu-lation correlations were all zero the function cortest follows Steiger (1980) who pointedout that the sum of the squared elements of a correlation matrix or the Fisher z scoreequivalents is distributed as chi square under the null hypothesis that the values are zero(ie elements of the identity matrix) This is particularly useful for examining whethercorrelations in a single matrix differ from zero or for comparing two matrices Althoughobvious cortest can be used to test whether the satact data matrix produces non-zerocorrelations (it does) This is a much more appropriate test when testing whether a residualmatrix differs from zero

                            gt cortest(satact)

                            33

                            Tests of correlation matrices

                            Callcortest(R1 = satact)

                            Chi Square value 132542 with df = 15 with probability lt 18e-273

                            36 Polychoric tetrachoric polyserial and biserial correlations

                            The Pearson correlation of dichotomous data is also known as the φ coefficient If thedata eg ability items are thought to represent an underlying continuous although latentvariable the φ will underestimate the value of the Pearson applied to these latent variablesOne solution to this problem is to use the tetrachoric correlation which is based uponthe assumption of a bivariate normal distribution that has been cut at certain points Thedrawtetra function demonstrates the process (Figure 14) This is also shown in termsof dichotomizing the bivariate normal density function using the drawcor function (Fig-ure 15) A simple generalization of this to the case of the multiple cuts is the polychoric

                            correlation

                            Other estimated correlations based upon the assumption of bivariate normality with cutpoints include the biserial and polyserial correlation

                            If the data are a mix of continuous polytomous and dichotomous variables the mixedcor

                            function will calculate the appropriate mixture of Pearson polychoric tetrachoric biserialand polyserial correlations

                            The correlation matrix resulting from a number of tetrachoric or polychoric correlationmatrix sometimes will not be positive semi-definite This will sometimes happen if thecorrelation matrix is formed by using pair-wise deletion of cases The corsmooth functionwill adjust the smallest eigen values of the correlation matrix to make them positive rescaleall of them to sum to the number of variables and produce aldquosmoothedrdquocorrelation matrixAn example of this problem is a data set of burt which probably had a typo in the originalcorrelation matrix Smoothing the matrix corrects this problem

                            4 Multilevel modeling

                            Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular use

                            34

                            gt drawtetra()

                            minus3 minus2 minus1 0 1 2 3

                            minus3

                            minus2

                            minus1

                            01

                            23

                            Y rho = 05phi = 033

                            X gt τY gt Τ

                            X lt τY gt Τ

                            X gt τY lt Τ

                            X lt τY lt Τ

                            x

                            dnor

                            m(x

                            )

                            X gt τ

                            τ

                            x1

                            Y gt Τ

                            Τ

                            Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values

                            35

                            gt drawcor(expand=20cuts=c(00))

                            xy

                            z

                            Bivariate density rho = 05

                            Figure 15 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values It isfound (laboriously) by optimizing the fit of the bivariate normal for various values of thecorrelation to the observed cell frequencies

                            36

                            is the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups

                            41 Decomposing data into within and between level correlations usingstatsBy

                            There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler functionto give some of the basic descriptive statistics for two level models

                            This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package

                            rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)

                            where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means

                            42 Generating and displaying multilevel data

                            withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a betweengroup correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data

                            simmultilevel will generate simulated data with a multilevel structure

                            The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary

                            function specifying the variable of interest

                            37

                            Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )

                            43 Factor analysis by groups

                            Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output

                            sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)

                            faBy(sbnfactors=5) find the 5 factor solution for each education level

                            5 Multiple Regression mediation moderation and set cor-relations

                            The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices

                            51 Multiple regression from data or correlation matrices

                            The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstonecorrelation matrix and find the multiple correlation of the last five variables as a functionof the first 4

                            gt setCor(y = 59x=14data=Thurstone)

                            Call setCor(y = 59 x = 14 data = Thurstone)

                            Multiple Regression from matrix input

                            Beta weights

                            FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

                            Sentences 009 007 025 021 020

                            Vocabulary 009 017 009 016 -002

                            SentCompletion 002 005 004 021 008

                            FirstLetters 058 045 021 008 031

                            38

                            Multiple R

                            FourLetterWords Suffixes LetterSeries Pedigrees

                            069 063 050 058

                            LetterGroup

                            048

                            multiple R2

                            FourLetterWords Suffixes LetterSeries Pedigrees

                            048 040 025 034

                            LetterGroup

                            023

                            Multiple Inflation Factor (VIF) = 1(1-SMC) =

                            Sentences Vocabulary SentCompletion FirstLetters

                            369 388 300 135

                            Unweighted multiple R

                            FourLetterWords Suffixes LetterSeries Pedigrees

                            059 058 049 058

                            LetterGroup

                            045

                            Unweighted multiple R2

                            FourLetterWords Suffixes LetterSeries Pedigrees

                            034 034 024 033

                            LetterGroup

                            020

                            Various estimates of between set correlations

                            Squared Canonical Correlations

                            [1] 06280 01478 00076 00049

                            Average squared canonical correlation = 02

                            Cohens Set Correlation R2 = 069

                            Unweighted correlation between the two sets = 073

                            By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed

                            gt sc lt- setCor(y = 59x=34data=Thurstonez=12)

                            Call setCor(y = 59 x = 34 data = Thurstone z = 12)

                            Multiple Regression from matrix input

                            Beta weights

                            FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

                            SentCompletion 002 005 004 021 008

                            FirstLetters 058 045 021 008 031

                            Multiple R

                            FourLetterWords Suffixes LetterSeries Pedigrees

                            058 046 021 018

                            LetterGroup

                            030

                            39

                            multiple R2

                            FourLetterWords Suffixes LetterSeries Pedigrees

                            0331 0210 0043 0032

                            LetterGroup

                            0092

                            Multiple Inflation Factor (VIF) = 1(1-SMC) =

                            SentCompletion FirstLetters

                            102 102

                            Unweighted multiple R

                            FourLetterWords Suffixes LetterSeries Pedigrees

                            044 035 017 014

                            LetterGroup

                            026

                            Unweighted multiple R2

                            FourLetterWords Suffixes LetterSeries Pedigrees

                            019 012 003 002

                            LetterGroup

                            007

                            Various estimates of between set correlations

                            Squared Canonical Correlations

                            [1] 0405 0023

                            Average squared canonical correlation = 021

                            Cohens Set Correlation R2 = 042

                            Unweighted correlation between the two sets = 048

                            gt round(sc$residual2)

                            FourLetterWords Suffixes LetterSeries Pedigrees

                            FourLetterWords 052 011 009 006

                            Suffixes 011 060 -001 001

                            LetterSeries 009 -001 075 028

                            Pedigrees 006 001 028 066

                            LetterGroup 013 003 037 020

                            LetterGroup

                            FourLetterWords 013

                            Suffixes 003

                            LetterSeries 037

                            Pedigrees 020

                            LetterGroup 077

                            52 Mediation and Moderation analysis

                            Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping

                            40

                            Consider the example from Preacher and Hayes (2004) as analyzed using the mediate

                            function and the subsequent graphic from mediatediagram The data are found in theexample for mediate

                            Call mediate(y = SATIS x = THERAPY m = ATTRIB data = sobel)

                            The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB

                            Total Direct effect(c) of THERAPY on SATIS = 076 SE = 031 t direct = 25 with probability = 0019

                            Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t direct = 135 with probability = 019

                            Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033

                            Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 004 Upper CI = 069

                            R2 of model = 031

                            To see the longer output specify short = FALSE in the print statement

                            Full output

                            Total effect estimates (c)

                            SATIS se t Prob

                            THERAPY 076 031 25 00186

                            Direct effect estimates (c)SATIS se t Prob

                            THERAPY 043 032 135 0190

                            ATTRIB 040 018 223 0034

                            a effect estimates

                            THERAPY se t Prob

                            ATTRIB 082 03 274 00106

                            b effect estimates

                            SATIS se t Prob

                            ATTRIB 04 018 223 0034

                            ab effect estimates

                            SATIS boot sd lower upper

                            THERAPY 033 032 017 004 069

                            bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables

                            setCor(y = c( SATV SATQ) x = c(education age ) data = satact std=TRUE)

                            bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap

                            mediate(y = c( SATV ) x = c(education age ) m= ACT data =satactstd=TRUEniter=50)

                            bull mediate will take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It then tests the mediation effect using a bootstrap The particular example is for demonstration purposes only and shows neithermoderation nor mediation The number of iterations for the boot strap was set to 50

                            41

                            gt mediatediagram(preacher)

                            Mediation model

                            THERAPY SATIS

                            ATTRIB

                            082

                            c = 076

                            c = 043

                            04

                            Figure 16 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram

                            42

                            gt preacher lt- setCor(1c(23)sobelstd=FALSE)

                            gt setCordiagram(preacher)

                            Regression Models

                            THERAPY

                            ATTRIB

                            SATIS

                            043

                            04

                            021

                            Figure 17 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure

                            43

                            for speed The default number of boot straps is 5000

                            53 Set Correlation

                            An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor

                            function Set correlation is

                            R2 = 1minusn

                            prodi=1

                            (1minusλi)

                            where λi is the ith eigen value of the eigen value decomposition of the matrix

                            R = Rminus1xx RxyRminus1

                            xx Rminus1xy

                            Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate

                            setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data

                            Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things tonotice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied

                            For this example the analysis is done on the correlation matrix rather than the rawdata

                            gt C lt- cov(satactuse=pairwise)

                            gt model1 lt- lm(ACT~ gender + education + age data=satact)

                            gt summary(model1)

                            Call

                            lm(formula = ACT ~ gender + education + age data = satact)

                            Residuals

                            44

                            Call mediate(y = c(SATQ) x = c(ACT) m = education data = satact

                            mod = gender niter = 50 std = TRUE)

                            The DV (Y) was SATQ The IV (X) was ACT gender ACTXgndr The mediating variable(s) = education

                            Total Direct effect(c) of ACT on SATQ = 058 SE = 003 t direct = 1925 with probability = 0

                            Direct effect (c) of ACT on SATQ removing education = 059 SE = 003 t direct = 1926 with probability = 0

                            Indirect effect (ab) of ACT on SATQ through education = -001

                            Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -002 Upper CI = 0

                            Total Direct effect(c) of gender on SATQ = -014 SE = 003 t direct = -478 with probability = 21e-06

                            Direct effect (c) of gender on NA removing education = -014 SE = 003 t direct = -463 with probability = 44e-06

                            Indirect effect (ab) of gender on SATQ through education = 0

                            Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -001 Upper CI = 0

                            Total Direct effect(c) of ACTXgndr on SATQ = 0 SE = 003 t direct = 002 with probability = 099

                            Direct effect (c) of ACTXgndr on NA removing education = 0 SE = 003 t direct = 001 with probability = 099

                            Indirect effect (ab) of ACTXgndr on SATQ through education = 0

                            Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = 0 Upper CI = 0

                            R2 of model = 037

                            To see the longer output specify short = FALSE in the print statement

                            Full output

                            Total effect estimates (c)

                            SATQ se t Prob

                            ACT 058 003 1925 000e+00

                            gender -014 003 -478 210e-06

                            ACTXgndr 000 003 002 985e-01

                            Direct effect estimates (c)SATQ se t Prob

                            ACT 059 003 1926 000e+00

                            gender -014 003 -463 437e-06

                            ACTXgndr 000 003 001 992e-01

                            a effect estimates

                            education se t Prob

                            ACT 016 004 422 277e-05

                            gender 009 004 250 128e-02

                            ACTXgndr -001 004 -015 883e-01

                            b effect estimates

                            SATQ se t Prob

                            education -004 003 -145 0147

                            ab effect estimates

                            SATQ boot sd lower upper

                            ACT -001 -001 001 0 0

                            gender 000 000 000 0 0

                            ACTXgndr 000 000 000 0 0

                            Moderation model

                            ACT

                            gender

                            ACTXgndr

                            SATQ

                            education016 c = 058

                            c = 059

                            009 c = minus014

                            c = minus014

                            minus001 c = 0

                            c = 0

                            minus004

                            minus004

                            minus007

                            002

                            Figure 18 Moderated multiple regression requires the raw data

                            45

                            Min 1Q Median 3Q Max

                            -252458 -32133 07769 35921 92630

                            Coefficients

                            Estimate Std Error t value Pr(gt|t|)

                            (Intercept) 2741706 082140 33378 lt 2e-16

                            gender -048606 037984 -1280 020110

                            education 047890 015235 3143 000174

                            age 001623 002278 0712 047650

                            ---

                            Signif codes 0 0001 001 005 01 1

                            Residual standard error 4768 on 696 degrees of freedom

                            Multiple R-squared 00272 Adjusted R-squared 002301

                            F-statistic 6487 on 3 and 696 DF p-value 00002476

                            Compare this with the output from setCor

                            gt compare with sector

                            gt setCor(c(46)c(13)C nobs=700)

                            Call setCor(y = c(46) x = c(13) data = C nobs = 700)

                            Multiple Regression from matrix input

                            Beta weights

                            ACT SATV SATQ

                            gender -005 -003 -018

                            education 014 010 010

                            age 003 -010 -009

                            Multiple R

                            ACT SATV SATQ

                            016 010 019

                            multiple R2

                            ACT SATV SATQ

                            00272 00096 00359

                            Multiple Inflation Factor (VIF) = 1(1-SMC) =

                            gender education age

                            101 145 144

                            Unweighted multiple R

                            ACT SATV SATQ

                            015 005 011

                            Unweighted multiple R2

                            ACT SATV SATQ

                            002 000 001

                            SE of Beta weights

                            ACT SATV SATQ

                            gender 018 429 434

                            education 022 513 518

                            age 022 511 516

                            t of Beta Weights

                            ACT SATV SATQ

                            gender -027 -001 -004

                            education 065 002 002

                            46

                            age 015 -002 -002

                            Probability of t lt

                            ACT SATV SATQ

                            gender 079 099 097

                            education 051 098 098

                            age 088 098 099

                            Shrunken R2

                            ACT SATV SATQ

                            00230 00054 00317

                            Standard Error of R2

                            ACT SATV SATQ

                            00120 00073 00137

                            F

                            ACT SATV SATQ

                            649 226 863

                            Probability of F lt

                            ACT SATV SATQ

                            248e-04 808e-02 124e-05

                            degrees of freedom of regression

                            [1] 3 696

                            Various estimates of between set correlations

                            Squared Canonical Correlations

                            [1] 0050 0033 0008

                            Chisq of canonical correlations

                            [1] 358 231 56

                            Average squared canonical correlation = 003

                            Cohens Set Correlation R2 = 009

                            Shrunken Set Correlation R2 = 008

                            F and df of Cohens Set Correlation 726 9 168186

                            Unweighted correlation between the two sets = 001

                            Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

                            6 Converting output to APA style tables using LATEX

                            Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenient

                            47

                            LATEXoutput and finally df2latex converts a generic data frame to LATEX

                            An example of converting the output from fa to LATEXappears in Table 2

                            Table 2 fa2latexA factor analysis table from the psych package in R

                            Variable MR1 MR2 MR3 h2 u2 com

                            Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

                            SS loadings 264 186 15

                            MR1 100 059 054MR2 059 100 052MR3 054 052 100

                            48

                            7 Miscellaneous functions

                            A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

                            blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

                            df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

                            scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

                            cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

                            cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

                            dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

                            fisherz Convert a correlation to the corresponding Fisher z score

                            geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

                            ICC and cohenkappa are typically used to find the reliability for raters

                            headtail combines the head and tail functions to show the first and last lines of a dataset or output

                            topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

                            mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

                            prep finds the probability of replication for an F t or r and estimate effect size

                            partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

                            rangeCorrection will correct correlations for restriction of range

                            reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

                            49

                            superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

                            8 Data sets

                            A number of data sets for demonstrating psychometric techniques are included in thepsych package These include six data sets showing a hierarchical factor structure (fivecognitive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2and one from health psychology Reise) One of these (Thurstone) is used as an examplein the sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) or 13 personality inventory scores (epibfi) and14 multiple choice iq items (iqitems) The vegetables example has paired comparisonpreferences for 9 vegetables This is an example of Thurstonian scaling used by Guilford(1954) and Nunnally (1967) Other data sets include cubits peas and heights fromGalton

                            Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

                            bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

                            satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

                            epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

                            50

                            iq 14 multiple choice ability items were included as part of the Synthetic Aperture Person-ality Assessment (SAPA) web based personality assessment project The data from1000 subjects are included here as a demonstration set for scoring multiple choiceinventories and doing basic item statistics

                            galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to ntroduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

                            Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

                            miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

                            9 Development version and a users guide

                            The most recent development version is available as a source file at the repository main-tained at httppersonality-projectorgr That version will have removed the mostrecently discovered bugs (but perhaps introduced other yet to be discovered ones) Todownload that version go to the repository httppersonality-projectorgrsrc

                            contrib and wander around For a Mac this version can be installed directly using theldquoother repositoryrdquo option in the package installer For a PC the zip file for the most recentrelease has been created using the win-builder facility at CRAN The development releasefor the Mac is usually several weeks ahead of the PC development version

                            Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httppersonality-projectorgrpsych_manualpdf

                            News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

                            gt news(Version gt 170package=psych)

                            51

                            10 Psychometric Theory

                            The psych package has been developed to help psychologists do basic research Many ofthe functions were developed to supplement a book (httppersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep)More information about the use of some of the functions may be found in the book

                            For more extensive discussion of the use of psych in particular and R in general consulthttppersonality-projectorgrrguidehtml A short guide to R

                            11 SessionInfo

                            This document was prepared using the following settings

                            gt sessionInfo()

                            R Under development (unstable) (2017-03-05 r72309)

                            Platform x86_64-apple-darwin1340 (64-bit)

                            Running under macOS Sierra 10124

                            Matrix products default

                            BLAS LibraryFrameworksRframeworkVersions34ResourcesliblibRblas0dylib

                            LAPACK LibraryFrameworksRframeworkVersions34ResourcesliblibRlapackdylib

                            locale

                            [1] C

                            attached base packages

                            [1] stats graphics grDevices utils datasets methods base

                            other attached packages

                            [1] psych_17421

                            loaded via a namespace (and not attached)

                            [1] compiler_340 parallel_340 tools_340 foreign_08-67

                            [5] KernSmooth_223-15 nlme_31-131 mnormt_15-4 grid_340

                            [9] lattice_020-34

                            52

                            References

                            Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

                            Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

                            Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

                            Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

                            Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

                            Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

                            Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

                            Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

                            Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

                            Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

                            Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

                            Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

                            Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

                            Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

                            Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

                            53

                            Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

                            Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

                            Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

                            Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

                            Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

                            Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

                            Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

                            Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

                            Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

                            Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

                            MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

                            Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

                            McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

                            Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

                            Nunnally J C (1967) Psychometric theory McGraw-Hill New York

                            54

                            Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

                            3rd edition

                            Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

                            Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

                            Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

                            Revelle W (2015) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 158

                            Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

                            Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

                            Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

                            Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

                            Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook ofIndividual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

                            Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

                            Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

                            Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

                            Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bang

                            55

                            for the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

                            Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

                            Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

                            Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

                            Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

                            Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

                            Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

                            Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

                            Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

                            Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

                            Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

                            Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

                            56

                            Index

                            affect 14 24alpha 5 6

                            Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26bifactor 6biserial 13 34blockrandom 49burt 34

                            char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49circular statistics 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49ctv 7cubits 50

                            densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49diagram 8drawcor 34drawtetra 34dummycode 13

                            dynamite plot 19

                            edit 3epibfi 50error bars 19errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23

                            fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factor analysis 6factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49

                            galton 51generalized least squares 6geometricmean 49GPArotation 7guttman 6

                            harmonicmean 49head 49headtail 49heights 50hetdiagram 7Hmisc 28Holzinger 50

                            57

                            ICC 6 49iclust 6iclustdiagram 7Index 49introduction to psychometric theory with ap-

                            plications in R 7iqitems 50irtfa 6 47irt2latex 47 49

                            KnitR 47

                            lavaan 38library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27lowess 14

                            makekeys 14MAP 6mardia 49maximum likelihood 6mediate 4 41 42mediatediagram 41minimum residual 6mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevel 37multilevelreliability 6multiple regression 38

                            nfactors 6nlme 37

                            omega 6 7outlier 3 11 12

                            padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7principal axis 6psych 3 5ndash8 28 47 49ndash52

                            R functionaffect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50

                            58

                            densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13edit 3epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49head 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47

                            irt2latex 47 49library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7psych 51psych package

                            affect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26

                            59

                            biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49corsmooth 34cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7

                            fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47irt2latex 47 49lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12prep 49pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34

                            60

                            polyserial 34principal 5ndash7psych 51rtest 28rangeCorrection 49readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

                            rtest 28

                            rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50table 19tail 49tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

                            R package

                            61

                            ctv 7GPArotation 7Hmisc 28KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash8 28 47 49ndash52Rgraphviz 8sem 7 50stats 28Sweave 47xtable 47

                            rtest 28rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49Rgraphviz 8

                            SAPA 26 50 51satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43sem 7 50set correlation 44setcor 49setCor 4 38 41 44 46 47simmultilevel 37

                            spider 13stars 14stats 28StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50Sweave 47

                            table 19tail 49tetrachoric 6 34Thurstone 28 38 50Thurstone33 50topBottom 49

                            vegetables 50 51violinBy 14 18vss 5 6

                            weighted least squares 6withinBetween 37

                            xtable 47

                            62

                            • Jump starting the psych packagendasha guide for the impatient
                            • Psychometric functions are summarized in the second vignette
                            • Overview of this and related documents
                            • Getting started
                            • Basic data analysis
                              • Getting the data by using readfile
                              • Data input from the clipboard
                              • Basic descriptive statistics
                                • Outlier detection using outlier
                                • Basic data cleaning using scrub
                                • Recoding categorical variables into dummy coded variables
                                  • Simple descriptive graphics
                                    • Scatter Plot Matrices
                                    • Density or violin plots
                                    • Means and error bars
                                    • Error bars for tabular data
                                    • Two dimensional displays of means and errors
                                    • Back to back histograms
                                    • Correlational structure
                                    • Heatmap displays of correlational structure
                                      • Testing correlations
                                      • Polychoric tetrachoric polyserial and biserial correlations
                                        • Multilevel modeling
                                          • Decomposing data into within and between level correlations using statsBy
                                          • Generating and displaying multilevel data
                                          • Factor analysis by groups
                                            • Multiple Regression mediation moderation and set correlations
                                              • Multiple regression from data or correlation matrices
                                              • Mediation and Moderation analysis
                                              • Set Correlation
                                                • Converting output to APA style tables using LaTeX
                                                • Miscellaneous functions
                                                • Data sets
                                                • Development version and a users guide
                                                • Psychometric Theory
                                                • SessionInfo

                              gt png( pairspanelspng )

                              gt satd2 lt- dataframe(satactd2) combine the d2 statistics from before with the satact dataframe

                              gt pairspanels(satd2bg=c(yellowblue)[(d2 gt 25)+1]pch=21stars=TRUE)

                              gt devoff()

                              null device

                              1

                              Figure 2 Using the pairspanels function to graphically show relationships The x axisin each scatter plot represents the column variable the y axis the row variable Note theextreme outlier for the ACT If the plot character were set to a period (pch=rsquorsquo) it wouldmake a cleaner graphic but in to show the outliers in color we use the plot characters 21and 22

                              15

                              gt png(affectpng)gt pairspanels(affect[1417]bg=c(redblackwhiteblue)[affect$Film]pch=21

                              + main=Affect varies by movies )

                              gt devoff()

                              null device

                              1

                              Figure 3 Using the pairspanels function to graphically show relationships The x axis ineach scatter plot represents the column variable the y axis the row variable The coloringrepresent four different movie conditions

                              16

                              gt keys lt- makekeys(msq[175]list(

                              + EA = c(active energetic vigorous wakeful wideawake fullofpep

                              + lively -sleepy -tired -drowsy)

                              + TA =c(intense jittery fearful tense clutchedup -quiet -still

                              + -placid -calm -atrest)

                              + PA =c(active excited strong inspired determined attentive

                              + interested enthusiastic proud alert)

                              + NAf =c(jittery nervous scared afraid guilty ashamed distressed

                              + upset hostile irritable )) )

                              gt scores lt- scoreItems(keysmsq[175])

                              gt png(msqpng)gt pairspanels(scores$scoressmoother=TRUE

                              + main =Density distributions of four measures of affect )

                              gt devoff()

                              null device

                              1

                              Figure 4 Using the pairspanels function to graphically show relationships The x axis ineach scatter plot represents the column variable the y axis the row variable The variablesare four measures of motivational state for 3896 participants Each scale is the averagescore of 10 items measuring motivational state Compare this a plot with smoother set toFALSE

                              17

                              gt data(satact)

                              gt violinBy(satact[56]satact$gendergrpname=c(M F)main=Density Plot by gender for SAT V and Q)

                              Density Plot by gender for SAT V and Q

                              Obs

                              erve

                              d

                              SATV M SATV F SATQ M SATQ F

                              200

                              300

                              400

                              500

                              600

                              700

                              800

                              Figure 5 Using the violinBy function to show the distribution of SAT V and Q for malesand females The plot shows the medians and 25th and 75th percentiles as well as theentire range and the density distribution

                              18

                              343 Means and error bars

                              Additional descriptive graphics include the ability to draw error bars on sets of data aswell as to draw error bars in both the x and y directions for paired data These are thefunctions errorbars errorbarsby errorbarstab and errorcrosses

                              errorbars show the 95 confidence intervals for each variable in a data frame or ma-trix These errors are based upon normal theory and the standard errors of the meanAlternative options include +- one standard deviation or 1 standard error If thedata are repeated measures the error bars will be reflect the between variable cor-relations By default the confidence intervals are displayed using a ldquocats eyesrdquo plotwhich emphasizes the distribution of confidence within the confidence interval

                              errorbarsby does the same but grouping the data by some condition

                              errorbarstab draws bar graphs from tabular data with error bars based upon thestandard error of proportion (σp =

                              radicpqN)

                              errorcrosses draw the confidence intervals for an x set and a y set of the same size

                              The use of the errorbarsby function allows for graphic comparisons of different groups(see Figure 6) Five personality measures are shown as a function of high versus low scoreson a ldquolierdquo scale People with higher lie scores tend to report being more agreeable consci-entious and less neurotic than people with lower lie scores The error bars are based uponnormal theory and thus are symmetric rather than reflect any skewing in the data

                              Although not recommended it is possible to use the errorbars function to draw bargraphs with associated error bars (This kind of dynamite plot (Figure 8) can be verymisleading in that the scale is arbitrary Go to a discussion of the problems in presentingdata this way at httpemdbolkerwikidotcomblogdynamite In the example shownnote that the graph starts at 0 although is out of the range This is a function of usingbars which always are assumed to start at zero Consider other ways of showing yourdata

                              344 Error bars for tabular data

                              However it is sometimes useful to show error bars for tabular data either found by thetable function or just directly input These may be found using the errorbarstab

                              function

                              19

                              gt data(epibfi)

                              gt errorbarsby(epibfi[610]epibfi$epilielt4)

                              095 confidence limits

                              Independent Variable

                              Dep

                              ende

                              nt V

                              aria

                              ble

                              bfagree bfcon bfext bfneur bfopen

                              050

                              100

                              150

                              Figure 6 Using the errorbarsby function shows that self reported personality scales onthe Big Five Inventory vary as a function of the Lie scale on the EPI The ldquocats eyesrdquo showthe distribution of the confidence

                              20

                              gt errorbarsby(satact[56]satact$genderbars=TRUE

                              + labels=c(MaleFemale)ylab=SAT scorexlab=)

                              Male Female

                              095 confidence limits

                              SAT

                              sco

                              re

                              200

                              300

                              400

                              500

                              600

                              700

                              800

                              200

                              300

                              400

                              500

                              600

                              700

                              800

                              Figure 7 A ldquoDynamite plotrdquo of SAT scores as a function of gender is one way of misleadingthe reader By using a bar graph the range of scores is ignored Bar graphs start from 0

                              21

                              gt T lt- with(satacttable(gendereducation))

                              gt rownames(T) lt- c(MF)

                              gt errorbarstab(Tway=bothylab=Proportion of Education Levelxlab=Level of Education

                              + main=Proportion of sample by education level)

                              Proportion of sample by education level

                              Level of Education

                              Pro

                              port

                              ion

                              of E

                              duca

                              tion

                              Leve

                              l

                              000

                              005

                              010

                              015

                              020

                              025

                              030

                              M 0 M 1 M 2 M 3 M 4 M 5

                              000

                              005

                              010

                              015

                              020

                              025

                              030

                              Figure 8 The proportion of each education level that is Male or Female By using theway=rdquobothrdquo option the percentages and errors are based upon the grand total Alterna-tively way=rdquocolumnsrdquo finds column wise percentages way=rdquorowsrdquo finds rowwise percent-ages The data can be converted to percentages (as shown) or by total count (raw=TRUE)The function invisibly returns the probabilities and standard errors See the help menu foran example of entering the data as a dataframe

                              22

                              345 Two dimensional displays of means and errors

                              Yet another way to display data for different conditions is to use the errorCrosses func-tion For instance the effect of various movies on both ldquoEnergetic Arousalrdquo and ldquoTenseArousalrdquo can be seen in one graph and compared to the same movie manipulations onldquoPositive Affectrdquo and ldquoNegative Affectrdquo Note how Energetic Arousal is increased by threeof the movie manipulations but that Positive Affect increases following the Happy movieonly

                              23

                              gt op lt- par(mfrow=c(12))

                              gt data(affect)

                              gt colors lt- c(blackredwhiteblue)

                              gt films lt- c(SadHorrorNeutralHappy)

                              gt affectstats lt- errorCircles(EA2TA2data=affect[-c(120)]group=Filmlabels=films

                              + xlab=Energetic Arousal ylab=Tense Arousalylim=c(1022)xlim=c(820)pch=16

                              + cex=2colors=colors main = Movies effect on arousal)gt errorCircles(PA2NA2data=affectstatslabels=filmsxlab=Positive Affect

                              + ylab=Negative Affect pch=16cex=2colors=colors main =Movies effect on affect)

                              gt op lt- par(mfrow=c(11))

                              8 12 16 20

                              1012

                              1416

                              1820

                              22

                              Movies effect on arousal

                              Energetic Arousal

                              Tens

                              e A

                              rous

                              al

                              SadHorror

                              NeutralHappy

                              6 8 10 12

                              24

                              68

                              10

                              Movies effect on affect

                              Positive Affect

                              Neg

                              ativ

                              e A

                              ffect

                              Sad

                              Horror

                              NeutralHappy

                              Figure 9 The use of the errorCircles function allows for two dimensional displays ofmeans and error bars The first call to errorCircles finds descriptive statistics for theaffect dataframe based upon the grouping variable of Film These data are returned andthen used by the second call which examines the effect of the same grouping variable upondifferent measures The size of the circles represent the relative sample sizes for each groupThe data are from the PMC lab and reported in Smillie et al (2012)

                              24

                              346 Back to back histograms

                              The bibars function summarize the characteristics of two groups (eg males and females)on a second variable (eg age) by drawing back to back histograms (see Figure 10)

                              25

                              data(bfi)gt png( bibarspng )

                              gt with(bfibibars(agegenderylab=Agemain=Age by males and females))

                              gt devoff()

                              null device

                              1

                              Figure 10 A bar plot of the age distribution for males and females shows the use ofbibars The data are males and females from 2800 cases collected using the SAPAprocedure and are available as part of the bfi data set

                              26

                              347 Correlational structure

                              There are many ways to display correlations Tabular displays are probably the mostcommon The output from the cor function in core R is a rectangular matrix lowerMat

                              will round this to (2) digits and then display as a lower off diagonal matrix lowerCor

                              calls cor with use=lsquopairwisersquo method=lsquopearsonrsquo as default values and returns (invisibly)the full correlation matrix and displays the lower off diagonal matrix

                              gt lowerCor(satact)

                              gendr edctn age ACT SATV SATQ

                              gender 100

                              education 009 100

                              age -002 055 100

                              ACT -004 015 011 100

                              SATV -002 005 -004 056 100

                              SATQ -017 003 -003 059 064 100

                              When comparing results from two different groups it is convenient to display them as onematrix with the results from one group below the diagonal and the other group above thediagonal Use lowerUpper to do this

                              gt female lt- subset(satactsatact$gender==2)

                              gt male lt- subset(satactsatact$gender==1)

                              gt lower lt- lowerCor(male[-1])

                              edctn age ACT SATV SATQ

                              education 100

                              age 061 100

                              ACT 016 015 100

                              SATV 002 -006 061 100

                              SATQ 008 004 060 068 100

                              gt upper lt- lowerCor(female[-1])

                              edctn age ACT SATV SATQ

                              education 100

                              age 052 100

                              ACT 016 008 100

                              SATV 007 -003 053 100

                              SATQ 003 -009 058 063 100

                              gt both lt- lowerUpper(lowerupper)

                              gt round(both2)

                              education age ACT SATV SATQ

                              education NA 052 016 007 003

                              age 061 NA 008 -003 -009

                              ACT 016 015 NA 053 058

                              SATV 002 -006 061 NA 063

                              SATQ 008 004 060 068 NA

                              It is also possible to compare two matrices by taking their differences and displaying one (be-low the diagonal) and the difference of the second from the first above the diagonal

                              27

                              gt diffs lt- lowerUpper(lowerupperdiff=TRUE)

                              gt round(diffs2)

                              education age ACT SATV SATQ

                              education NA 009 000 -005 005

                              age 061 NA 007 -003 013

                              ACT 016 015 NA 008 002

                              SATV 002 -006 061 NA 005

                              SATQ 008 004 060 068 NA

                              348 Heatmap displays of correlational structure

                              Perhaps a better way to see the structure in a correlation matrix is to display a heat mapof the correlations This is just a matrix color coded to represent the magnitude of thecorrelation This is useful when considering the number of factors in a data set Considerthe Thurstone data set which has a clear 3 factor solution (Figure 11) or a simulated dataset of 24 variables with a circumplex structure (Figure 12) The color coding representsa ldquoheat maprdquo of the correlations with darker shades of red representing stronger negativeand darker shades of blue stronger positive correlations As an option the value of thecorrelation can be shown

                              Yet another way to show structure is to use ldquospiderrdquo plots Particularly if variables areordered in some meaningful way (eg in a circumplex) a spider plot will show this structureeasily This is just a plot of the magnitude of the correlation as a radial line with lengthranging from 0 (for a correlation of -1) to 1 (for a correlation of 1) (See Figure 13)

                              35 Testing correlations

                              Correlations are wonderful descriptive statistics of the data but some people like to testwhether these correlations differ from zero or differ from each other The cortest func-tion (in the stats package) will test the significance of a single correlation and the rcorr

                              function in the Hmisc package will do this for many correlations In the psych packagethe corrtest function reports the correlation (Pearson Spearman or Kendall) betweenall variables in either one or two data frames or matrices as well as the number of obser-vations for each case and the (two-tailed) probability for each correlation Unfortunatelythese probability values have not been corrected for multiple comparisons and so shouldbe taken with a great deal of salt Thus in corrtest and corrp the raw probabilitiesare reported below the diagonal and the probabilities adjusted for multiple comparisonsusing (by default) the Holm correction are reported above the diagonal (Table 1) (See thepadjust function for a discussion of Holm (1979) and other corrections)

                              Testing the difference between any two correlations can be done using the rtest functionThe function actually does four different tests (based upon an article by Steiger (1980)

                              28

                              gt png(corplotpng)gt corPlot(Thurstonenumbers=TRUEupper=FALSEdiag=FALSEmain=9 cognitive variables from Thurstone)

                              gt devoff()

                              null device

                              1

                              Figure 11 The structure of correlation matrix can be seen more clearly if the variables aregrouped by factor and then the correlations are shown by color By using the rsquonumbersrsquooption the values are displayed as well By default the complete matrix is shown Settingupper=FALSE and diag=FALSE shows a cleaner figure

                              29

                              gt png(circplotpng)gt circ lt- simcirc(24)

                              gt rcirc lt- cor(circ)

                              gt corPlot(rcircmain=24 variables in a circumplex)gt devoff()

                              null device

                              1

                              Figure 12 Using the corPlot function to show the correlations in a circumplex Correlationsare highest near the diagonal diminish to zero further from the diagonal and the increaseagain towards the corners of the matrix Circumplex structures are common in the studyof affect For circumplex structures it is perhaps useful to show the complete matrix

                              30

                              gt png(spiderpng)gt oplt- par(mfrow=c(22))

                              gt spider(y=c(161218)x=124data=rcircfill=TRUEmain=Spider plot of 24 circumplex variables)

                              gt op lt- par(mfrow=c(11))

                              gt devoff()

                              null device

                              1

                              Figure 13 A spider plot can show circumplex structure very clearly Circumplex structuresare common in the study of affect

                              31

                              Table 1 The corrtest function reports correlations cell sizes and raw and adjustedprobability values corrp reports the probability values for a correlation matrix Bydefault the adjustment used is that of Holm (1979)gt corrtest(satact)

                              Callcorrtest(x = satact)

                              Correlation matrix

                              gender education age ACT SATV SATQ

                              gender 100 009 -002 -004 -002 -017

                              education 009 100 055 015 005 003

                              age -002 055 100 011 -004 -003

                              ACT -004 015 011 100 056 059

                              SATV -002 005 -004 056 100 064

                              SATQ -017 003 -003 059 064 100

                              Sample Size

                              gender education age ACT SATV SATQ

                              gender 700 700 700 700 700 687

                              education 700 700 700 700 700 687

                              age 700 700 700 700 700 687

                              ACT 700 700 700 700 700 687

                              SATV 700 700 700 700 700 687

                              SATQ 687 687 687 687 687 687

                              Probability values (Entries above the diagonal are adjusted for multiple tests)

                              gender education age ACT SATV SATQ

                              gender 000 017 100 100 1 0

                              education 002 000 000 000 1 1

                              age 058 000 000 003 1 1

                              ACT 033 000 000 000 0 0

                              SATV 062 022 026 000 0 0

                              SATQ 000 036 037 000 0 0

                              To see confidence intervals of the correlations print with the short=FALSE option

                              32

                              depending upon the input

                              1) For a sample size n find the t and p value for a single correlation as well as the confidenceinterval

                              gt rtest(503)

                              Correlation tests

                              Callrtest(n = 50 r12 = 03)

                              Test of significance of a correlation

                              t value 218 with probability lt 0034

                              and confidence interval 002 053

                              2) For sample sizes of n and n2 (n2 = n if not specified) find the z of the difference betweenthe z transformed correlations divided by the standard error of the difference of two zscores

                              gt rtest(3046)

                              Correlation tests

                              Callrtest(n = 30 r12 = 04 r34 = 06)

                              Test of difference between two independent correlations

                              z value 099 with probability 032

                              3) For sample size n and correlations ra= r12 rb= r23 and r13 specified test for thedifference of two dependent correlations (Steiger case A)

                              gt rtest(103451)

                              Correlation tests

                              Call[1] rtest(n = 103 r12 = 04 r23 = 01 r13 = 05 )

                              Test of difference between two correlated correlations

                              t value -089 with probability lt 037

                              4) For sample size n test for the difference between two dependent correlations involvingdifferent variables (Steiger case B)

                              gt rtest(103567558) steiger Case B

                              Correlation tests

                              Callrtest(n = 103 r12 = 05 r34 = 06 r23 = 07 r13 = 05 r14 = 05

                              r24 = 08)

                              Test of difference between two dependent correlations

                              z value -12 with probability 023

                              To test whether a matrix of correlations differs from what would be expected if the popu-lation correlations were all zero the function cortest follows Steiger (1980) who pointedout that the sum of the squared elements of a correlation matrix or the Fisher z scoreequivalents is distributed as chi square under the null hypothesis that the values are zero(ie elements of the identity matrix) This is particularly useful for examining whethercorrelations in a single matrix differ from zero or for comparing two matrices Althoughobvious cortest can be used to test whether the satact data matrix produces non-zerocorrelations (it does) This is a much more appropriate test when testing whether a residualmatrix differs from zero

                              gt cortest(satact)

                              33

                              Tests of correlation matrices

                              Callcortest(R1 = satact)

                              Chi Square value 132542 with df = 15 with probability lt 18e-273

                              36 Polychoric tetrachoric polyserial and biserial correlations

                              The Pearson correlation of dichotomous data is also known as the φ coefficient If thedata eg ability items are thought to represent an underlying continuous although latentvariable the φ will underestimate the value of the Pearson applied to these latent variablesOne solution to this problem is to use the tetrachoric correlation which is based uponthe assumption of a bivariate normal distribution that has been cut at certain points Thedrawtetra function demonstrates the process (Figure 14) This is also shown in termsof dichotomizing the bivariate normal density function using the drawcor function (Fig-ure 15) A simple generalization of this to the case of the multiple cuts is the polychoric

                              correlation

                              Other estimated correlations based upon the assumption of bivariate normality with cutpoints include the biserial and polyserial correlation

                              If the data are a mix of continuous polytomous and dichotomous variables the mixedcor

                              function will calculate the appropriate mixture of Pearson polychoric tetrachoric biserialand polyserial correlations

                              The correlation matrix resulting from a number of tetrachoric or polychoric correlationmatrix sometimes will not be positive semi-definite This will sometimes happen if thecorrelation matrix is formed by using pair-wise deletion of cases The corsmooth functionwill adjust the smallest eigen values of the correlation matrix to make them positive rescaleall of them to sum to the number of variables and produce aldquosmoothedrdquocorrelation matrixAn example of this problem is a data set of burt which probably had a typo in the originalcorrelation matrix Smoothing the matrix corrects this problem

                              4 Multilevel modeling

                              Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular use

                              34

                              gt drawtetra()

                              minus3 minus2 minus1 0 1 2 3

                              minus3

                              minus2

                              minus1

                              01

                              23

                              Y rho = 05phi = 033

                              X gt τY gt Τ

                              X lt τY gt Τ

                              X gt τY lt Τ

                              X lt τY lt Τ

                              x

                              dnor

                              m(x

                              )

                              X gt τ

                              τ

                              x1

                              Y gt Τ

                              Τ

                              Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values

                              35

                              gt drawcor(expand=20cuts=c(00))

                              xy

                              z

                              Bivariate density rho = 05

                              Figure 15 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values It isfound (laboriously) by optimizing the fit of the bivariate normal for various values of thecorrelation to the observed cell frequencies

                              36

                              is the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups

                              41 Decomposing data into within and between level correlations usingstatsBy

                              There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler functionto give some of the basic descriptive statistics for two level models

                              This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package

                              rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)

                              where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means

                              42 Generating and displaying multilevel data

                              withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a betweengroup correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data

                              simmultilevel will generate simulated data with a multilevel structure

                              The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary

                              function specifying the variable of interest

                              37

                              Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )

                              43 Factor analysis by groups

                              Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output

                              sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)

                              faBy(sbnfactors=5) find the 5 factor solution for each education level

                              5 Multiple Regression mediation moderation and set cor-relations

                              The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices

                              51 Multiple regression from data or correlation matrices

                              The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstonecorrelation matrix and find the multiple correlation of the last five variables as a functionof the first 4

                              gt setCor(y = 59x=14data=Thurstone)

                              Call setCor(y = 59 x = 14 data = Thurstone)

                              Multiple Regression from matrix input

                              Beta weights

                              FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

                              Sentences 009 007 025 021 020

                              Vocabulary 009 017 009 016 -002

                              SentCompletion 002 005 004 021 008

                              FirstLetters 058 045 021 008 031

                              38

                              Multiple R

                              FourLetterWords Suffixes LetterSeries Pedigrees

                              069 063 050 058

                              LetterGroup

                              048

                              multiple R2

                              FourLetterWords Suffixes LetterSeries Pedigrees

                              048 040 025 034

                              LetterGroup

                              023

                              Multiple Inflation Factor (VIF) = 1(1-SMC) =

                              Sentences Vocabulary SentCompletion FirstLetters

                              369 388 300 135

                              Unweighted multiple R

                              FourLetterWords Suffixes LetterSeries Pedigrees

                              059 058 049 058

                              LetterGroup

                              045

                              Unweighted multiple R2

                              FourLetterWords Suffixes LetterSeries Pedigrees

                              034 034 024 033

                              LetterGroup

                              020

                              Various estimates of between set correlations

                              Squared Canonical Correlations

                              [1] 06280 01478 00076 00049

                              Average squared canonical correlation = 02

                              Cohens Set Correlation R2 = 069

                              Unweighted correlation between the two sets = 073

                              By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed

                              gt sc lt- setCor(y = 59x=34data=Thurstonez=12)

                              Call setCor(y = 59 x = 34 data = Thurstone z = 12)

                              Multiple Regression from matrix input

                              Beta weights

                              FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

                              SentCompletion 002 005 004 021 008

                              FirstLetters 058 045 021 008 031

                              Multiple R

                              FourLetterWords Suffixes LetterSeries Pedigrees

                              058 046 021 018

                              LetterGroup

                              030

                              39

                              multiple R2

                              FourLetterWords Suffixes LetterSeries Pedigrees

                              0331 0210 0043 0032

                              LetterGroup

                              0092

                              Multiple Inflation Factor (VIF) = 1(1-SMC) =

                              SentCompletion FirstLetters

                              102 102

                              Unweighted multiple R

                              FourLetterWords Suffixes LetterSeries Pedigrees

                              044 035 017 014

                              LetterGroup

                              026

                              Unweighted multiple R2

                              FourLetterWords Suffixes LetterSeries Pedigrees

                              019 012 003 002

                              LetterGroup

                              007

                              Various estimates of between set correlations

                              Squared Canonical Correlations

                              [1] 0405 0023

                              Average squared canonical correlation = 021

                              Cohens Set Correlation R2 = 042

                              Unweighted correlation between the two sets = 048

                              gt round(sc$residual2)

                              FourLetterWords Suffixes LetterSeries Pedigrees

                              FourLetterWords 052 011 009 006

                              Suffixes 011 060 -001 001

                              LetterSeries 009 -001 075 028

                              Pedigrees 006 001 028 066

                              LetterGroup 013 003 037 020

                              LetterGroup

                              FourLetterWords 013

                              Suffixes 003

                              LetterSeries 037

                              Pedigrees 020

                              LetterGroup 077

                              52 Mediation and Moderation analysis

                              Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping

                              40

                              Consider the example from Preacher and Hayes (2004) as analyzed using the mediate

                              function and the subsequent graphic from mediatediagram The data are found in theexample for mediate

                              Call mediate(y = SATIS x = THERAPY m = ATTRIB data = sobel)

                              The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB

                              Total Direct effect(c) of THERAPY on SATIS = 076 SE = 031 t direct = 25 with probability = 0019

                              Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t direct = 135 with probability = 019

                              Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033

                              Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 004 Upper CI = 069

                              R2 of model = 031

                              To see the longer output specify short = FALSE in the print statement

                              Full output

                              Total effect estimates (c)

                              SATIS se t Prob

                              THERAPY 076 031 25 00186

                              Direct effect estimates (c)SATIS se t Prob

                              THERAPY 043 032 135 0190

                              ATTRIB 040 018 223 0034

                              a effect estimates

                              THERAPY se t Prob

                              ATTRIB 082 03 274 00106

                              b effect estimates

                              SATIS se t Prob

                              ATTRIB 04 018 223 0034

                              ab effect estimates

                              SATIS boot sd lower upper

                              THERAPY 033 032 017 004 069

                              bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables

                              setCor(y = c( SATV SATQ) x = c(education age ) data = satact std=TRUE)

                              bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap

                              mediate(y = c( SATV ) x = c(education age ) m= ACT data =satactstd=TRUEniter=50)

                              bull mediate will take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It then tests the mediation effect using a bootstrap The particular example is for demonstration purposes only and shows neithermoderation nor mediation The number of iterations for the boot strap was set to 50

                              41

                              gt mediatediagram(preacher)

                              Mediation model

                              THERAPY SATIS

                              ATTRIB

                              082

                              c = 076

                              c = 043

                              04

                              Figure 16 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram

                              42

                              gt preacher lt- setCor(1c(23)sobelstd=FALSE)

                              gt setCordiagram(preacher)

                              Regression Models

                              THERAPY

                              ATTRIB

                              SATIS

                              043

                              04

                              021

                              Figure 17 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure

                              43

                              for speed The default number of boot straps is 5000

                              53 Set Correlation

                              An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor

                              function Set correlation is

                              R2 = 1minusn

                              prodi=1

                              (1minusλi)

                              where λi is the ith eigen value of the eigen value decomposition of the matrix

                              R = Rminus1xx RxyRminus1

                              xx Rminus1xy

                              Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate

                              setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data

                              Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things tonotice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied

                              For this example the analysis is done on the correlation matrix rather than the rawdata

                              gt C lt- cov(satactuse=pairwise)

                              gt model1 lt- lm(ACT~ gender + education + age data=satact)

                              gt summary(model1)

                              Call

                              lm(formula = ACT ~ gender + education + age data = satact)

                              Residuals

                              44

                              Call mediate(y = c(SATQ) x = c(ACT) m = education data = satact

                              mod = gender niter = 50 std = TRUE)

                              The DV (Y) was SATQ The IV (X) was ACT gender ACTXgndr The mediating variable(s) = education

                              Total Direct effect(c) of ACT on SATQ = 058 SE = 003 t direct = 1925 with probability = 0

                              Direct effect (c) of ACT on SATQ removing education = 059 SE = 003 t direct = 1926 with probability = 0

                              Indirect effect (ab) of ACT on SATQ through education = -001

                              Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -002 Upper CI = 0

                              Total Direct effect(c) of gender on SATQ = -014 SE = 003 t direct = -478 with probability = 21e-06

                              Direct effect (c) of gender on NA removing education = -014 SE = 003 t direct = -463 with probability = 44e-06

                              Indirect effect (ab) of gender on SATQ through education = 0

                              Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -001 Upper CI = 0

                              Total Direct effect(c) of ACTXgndr on SATQ = 0 SE = 003 t direct = 002 with probability = 099

                              Direct effect (c) of ACTXgndr on NA removing education = 0 SE = 003 t direct = 001 with probability = 099

                              Indirect effect (ab) of ACTXgndr on SATQ through education = 0

                              Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = 0 Upper CI = 0

                              R2 of model = 037

                              To see the longer output specify short = FALSE in the print statement

                              Full output

                              Total effect estimates (c)

                              SATQ se t Prob

                              ACT 058 003 1925 000e+00

                              gender -014 003 -478 210e-06

                              ACTXgndr 000 003 002 985e-01

                              Direct effect estimates (c)SATQ se t Prob

                              ACT 059 003 1926 000e+00

                              gender -014 003 -463 437e-06

                              ACTXgndr 000 003 001 992e-01

                              a effect estimates

                              education se t Prob

                              ACT 016 004 422 277e-05

                              gender 009 004 250 128e-02

                              ACTXgndr -001 004 -015 883e-01

                              b effect estimates

                              SATQ se t Prob

                              education -004 003 -145 0147

                              ab effect estimates

                              SATQ boot sd lower upper

                              ACT -001 -001 001 0 0

                              gender 000 000 000 0 0

                              ACTXgndr 000 000 000 0 0

                              Moderation model

                              ACT

                              gender

                              ACTXgndr

                              SATQ

                              education016 c = 058

                              c = 059

                              009 c = minus014

                              c = minus014

                              minus001 c = 0

                              c = 0

                              minus004

                              minus004

                              minus007

                              002

                              Figure 18 Moderated multiple regression requires the raw data

                              45

                              Min 1Q Median 3Q Max

                              -252458 -32133 07769 35921 92630

                              Coefficients

                              Estimate Std Error t value Pr(gt|t|)

                              (Intercept) 2741706 082140 33378 lt 2e-16

                              gender -048606 037984 -1280 020110

                              education 047890 015235 3143 000174

                              age 001623 002278 0712 047650

                              ---

                              Signif codes 0 0001 001 005 01 1

                              Residual standard error 4768 on 696 degrees of freedom

                              Multiple R-squared 00272 Adjusted R-squared 002301

                              F-statistic 6487 on 3 and 696 DF p-value 00002476

                              Compare this with the output from setCor

                              gt compare with sector

                              gt setCor(c(46)c(13)C nobs=700)

                              Call setCor(y = c(46) x = c(13) data = C nobs = 700)

                              Multiple Regression from matrix input

                              Beta weights

                              ACT SATV SATQ

                              gender -005 -003 -018

                              education 014 010 010

                              age 003 -010 -009

                              Multiple R

                              ACT SATV SATQ

                              016 010 019

                              multiple R2

                              ACT SATV SATQ

                              00272 00096 00359

                              Multiple Inflation Factor (VIF) = 1(1-SMC) =

                              gender education age

                              101 145 144

                              Unweighted multiple R

                              ACT SATV SATQ

                              015 005 011

                              Unweighted multiple R2

                              ACT SATV SATQ

                              002 000 001

                              SE of Beta weights

                              ACT SATV SATQ

                              gender 018 429 434

                              education 022 513 518

                              age 022 511 516

                              t of Beta Weights

                              ACT SATV SATQ

                              gender -027 -001 -004

                              education 065 002 002

                              46

                              age 015 -002 -002

                              Probability of t lt

                              ACT SATV SATQ

                              gender 079 099 097

                              education 051 098 098

                              age 088 098 099

                              Shrunken R2

                              ACT SATV SATQ

                              00230 00054 00317

                              Standard Error of R2

                              ACT SATV SATQ

                              00120 00073 00137

                              F

                              ACT SATV SATQ

                              649 226 863

                              Probability of F lt

                              ACT SATV SATQ

                              248e-04 808e-02 124e-05

                              degrees of freedom of regression

                              [1] 3 696

                              Various estimates of between set correlations

                              Squared Canonical Correlations

                              [1] 0050 0033 0008

                              Chisq of canonical correlations

                              [1] 358 231 56

                              Average squared canonical correlation = 003

                              Cohens Set Correlation R2 = 009

                              Shrunken Set Correlation R2 = 008

                              F and df of Cohens Set Correlation 726 9 168186

                              Unweighted correlation between the two sets = 001

                              Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

                              6 Converting output to APA style tables using LATEX

                              Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenient

                              47

                              LATEXoutput and finally df2latex converts a generic data frame to LATEX

                              An example of converting the output from fa to LATEXappears in Table 2

                              Table 2 fa2latexA factor analysis table from the psych package in R

                              Variable MR1 MR2 MR3 h2 u2 com

                              Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

                              SS loadings 264 186 15

                              MR1 100 059 054MR2 059 100 052MR3 054 052 100

                              48

                              7 Miscellaneous functions

                              A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

                              blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

                              df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

                              scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

                              cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

                              cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

                              dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

                              fisherz Convert a correlation to the corresponding Fisher z score

                              geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

                              ICC and cohenkappa are typically used to find the reliability for raters

                              headtail combines the head and tail functions to show the first and last lines of a dataset or output

                              topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

                              mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

                              prep finds the probability of replication for an F t or r and estimate effect size

                              partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

                              rangeCorrection will correct correlations for restriction of range

                              reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

                              49

                              superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

                              8 Data sets

                              A number of data sets for demonstrating psychometric techniques are included in thepsych package These include six data sets showing a hierarchical factor structure (fivecognitive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2and one from health psychology Reise) One of these (Thurstone) is used as an examplein the sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) or 13 personality inventory scores (epibfi) and14 multiple choice iq items (iqitems) The vegetables example has paired comparisonpreferences for 9 vegetables This is an example of Thurstonian scaling used by Guilford(1954) and Nunnally (1967) Other data sets include cubits peas and heights fromGalton

                              Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

                              bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

                              satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

                              epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

                              50

                              iq 14 multiple choice ability items were included as part of the Synthetic Aperture Person-ality Assessment (SAPA) web based personality assessment project The data from1000 subjects are included here as a demonstration set for scoring multiple choiceinventories and doing basic item statistics

                              galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to ntroduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

                              Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

                              miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

                              9 Development version and a users guide

                              The most recent development version is available as a source file at the repository main-tained at httppersonality-projectorgr That version will have removed the mostrecently discovered bugs (but perhaps introduced other yet to be discovered ones) Todownload that version go to the repository httppersonality-projectorgrsrc

                              contrib and wander around For a Mac this version can be installed directly using theldquoother repositoryrdquo option in the package installer For a PC the zip file for the most recentrelease has been created using the win-builder facility at CRAN The development releasefor the Mac is usually several weeks ahead of the PC development version

                              Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httppersonality-projectorgrpsych_manualpdf

                              News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

                              gt news(Version gt 170package=psych)

                              51

                              10 Psychometric Theory

                              The psych package has been developed to help psychologists do basic research Many ofthe functions were developed to supplement a book (httppersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep)More information about the use of some of the functions may be found in the book

                              For more extensive discussion of the use of psych in particular and R in general consulthttppersonality-projectorgrrguidehtml A short guide to R

                              11 SessionInfo

                              This document was prepared using the following settings

                              gt sessionInfo()

                              R Under development (unstable) (2017-03-05 r72309)

                              Platform x86_64-apple-darwin1340 (64-bit)

                              Running under macOS Sierra 10124

                              Matrix products default

                              BLAS LibraryFrameworksRframeworkVersions34ResourcesliblibRblas0dylib

                              LAPACK LibraryFrameworksRframeworkVersions34ResourcesliblibRlapackdylib

                              locale

                              [1] C

                              attached base packages

                              [1] stats graphics grDevices utils datasets methods base

                              other attached packages

                              [1] psych_17421

                              loaded via a namespace (and not attached)

                              [1] compiler_340 parallel_340 tools_340 foreign_08-67

                              [5] KernSmooth_223-15 nlme_31-131 mnormt_15-4 grid_340

                              [9] lattice_020-34

                              52

                              References

                              Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

                              Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

                              Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

                              Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

                              Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

                              Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

                              Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

                              Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

                              Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

                              Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

                              Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

                              Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

                              Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

                              Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

                              Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

                              53

                              Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

                              Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

                              Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

                              Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

                              Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

                              Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

                              Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

                              Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

                              Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

                              Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

                              MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

                              Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

                              McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

                              Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

                              Nunnally J C (1967) Psychometric theory McGraw-Hill New York

                              54

                              Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

                              3rd edition

                              Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

                              Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

                              Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

                              Revelle W (2015) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 158

                              Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

                              Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

                              Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

                              Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

                              Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook ofIndividual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

                              Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

                              Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

                              Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

                              Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bang

                              55

                              for the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

                              Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

                              Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

                              Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

                              Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

                              Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

                              Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

                              Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

                              Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

                              Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

                              Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

                              Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

                              56

                              Index

                              affect 14 24alpha 5 6

                              Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26bifactor 6biserial 13 34blockrandom 49burt 34

                              char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49circular statistics 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49ctv 7cubits 50

                              densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49diagram 8drawcor 34drawtetra 34dummycode 13

                              dynamite plot 19

                              edit 3epibfi 50error bars 19errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23

                              fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factor analysis 6factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49

                              galton 51generalized least squares 6geometricmean 49GPArotation 7guttman 6

                              harmonicmean 49head 49headtail 49heights 50hetdiagram 7Hmisc 28Holzinger 50

                              57

                              ICC 6 49iclust 6iclustdiagram 7Index 49introduction to psychometric theory with ap-

                              plications in R 7iqitems 50irtfa 6 47irt2latex 47 49

                              KnitR 47

                              lavaan 38library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27lowess 14

                              makekeys 14MAP 6mardia 49maximum likelihood 6mediate 4 41 42mediatediagram 41minimum residual 6mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevel 37multilevelreliability 6multiple regression 38

                              nfactors 6nlme 37

                              omega 6 7outlier 3 11 12

                              padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7principal axis 6psych 3 5ndash8 28 47 49ndash52

                              R functionaffect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50

                              58

                              densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13edit 3epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49head 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47

                              irt2latex 47 49library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7psych 51psych package

                              affect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26

                              59

                              biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49corsmooth 34cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7

                              fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47irt2latex 47 49lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12prep 49pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34

                              60

                              polyserial 34principal 5ndash7psych 51rtest 28rangeCorrection 49readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

                              rtest 28

                              rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50table 19tail 49tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

                              R package

                              61

                              ctv 7GPArotation 7Hmisc 28KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash8 28 47 49ndash52Rgraphviz 8sem 7 50stats 28Sweave 47xtable 47

                              rtest 28rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49Rgraphviz 8

                              SAPA 26 50 51satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43sem 7 50set correlation 44setcor 49setCor 4 38 41 44 46 47simmultilevel 37

                              spider 13stars 14stats 28StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50Sweave 47

                              table 19tail 49tetrachoric 6 34Thurstone 28 38 50Thurstone33 50topBottom 49

                              vegetables 50 51violinBy 14 18vss 5 6

                              weighted least squares 6withinBetween 37

                              xtable 47

                              62

                              • Jump starting the psych packagendasha guide for the impatient
                              • Psychometric functions are summarized in the second vignette
                              • Overview of this and related documents
                              • Getting started
                              • Basic data analysis
                                • Getting the data by using readfile
                                • Data input from the clipboard
                                • Basic descriptive statistics
                                  • Outlier detection using outlier
                                  • Basic data cleaning using scrub
                                  • Recoding categorical variables into dummy coded variables
                                    • Simple descriptive graphics
                                      • Scatter Plot Matrices
                                      • Density or violin plots
                                      • Means and error bars
                                      • Error bars for tabular data
                                      • Two dimensional displays of means and errors
                                      • Back to back histograms
                                      • Correlational structure
                                      • Heatmap displays of correlational structure
                                        • Testing correlations
                                        • Polychoric tetrachoric polyserial and biserial correlations
                                          • Multilevel modeling
                                            • Decomposing data into within and between level correlations using statsBy
                                            • Generating and displaying multilevel data
                                            • Factor analysis by groups
                                              • Multiple Regression mediation moderation and set correlations
                                                • Multiple regression from data or correlation matrices
                                                • Mediation and Moderation analysis
                                                • Set Correlation
                                                  • Converting output to APA style tables using LaTeX
                                                  • Miscellaneous functions
                                                  • Data sets
                                                  • Development version and a users guide
                                                  • Psychometric Theory
                                                  • SessionInfo

                                gt png(affectpng)gt pairspanels(affect[1417]bg=c(redblackwhiteblue)[affect$Film]pch=21

                                + main=Affect varies by movies )

                                gt devoff()

                                null device

                                1

                                Figure 3 Using the pairspanels function to graphically show relationships The x axis ineach scatter plot represents the column variable the y axis the row variable The coloringrepresent four different movie conditions

                                16

                                gt keys lt- makekeys(msq[175]list(

                                + EA = c(active energetic vigorous wakeful wideawake fullofpep

                                + lively -sleepy -tired -drowsy)

                                + TA =c(intense jittery fearful tense clutchedup -quiet -still

                                + -placid -calm -atrest)

                                + PA =c(active excited strong inspired determined attentive

                                + interested enthusiastic proud alert)

                                + NAf =c(jittery nervous scared afraid guilty ashamed distressed

                                + upset hostile irritable )) )

                                gt scores lt- scoreItems(keysmsq[175])

                                gt png(msqpng)gt pairspanels(scores$scoressmoother=TRUE

                                + main =Density distributions of four measures of affect )

                                gt devoff()

                                null device

                                1

                                Figure 4 Using the pairspanels function to graphically show relationships The x axis ineach scatter plot represents the column variable the y axis the row variable The variablesare four measures of motivational state for 3896 participants Each scale is the averagescore of 10 items measuring motivational state Compare this a plot with smoother set toFALSE

                                17

                                gt data(satact)

                                gt violinBy(satact[56]satact$gendergrpname=c(M F)main=Density Plot by gender for SAT V and Q)

                                Density Plot by gender for SAT V and Q

                                Obs

                                erve

                                d

                                SATV M SATV F SATQ M SATQ F

                                200

                                300

                                400

                                500

                                600

                                700

                                800

                                Figure 5 Using the violinBy function to show the distribution of SAT V and Q for malesand females The plot shows the medians and 25th and 75th percentiles as well as theentire range and the density distribution

                                18

                                343 Means and error bars

                                Additional descriptive graphics include the ability to draw error bars on sets of data aswell as to draw error bars in both the x and y directions for paired data These are thefunctions errorbars errorbarsby errorbarstab and errorcrosses

                                errorbars show the 95 confidence intervals for each variable in a data frame or ma-trix These errors are based upon normal theory and the standard errors of the meanAlternative options include +- one standard deviation or 1 standard error If thedata are repeated measures the error bars will be reflect the between variable cor-relations By default the confidence intervals are displayed using a ldquocats eyesrdquo plotwhich emphasizes the distribution of confidence within the confidence interval

                                errorbarsby does the same but grouping the data by some condition

                                errorbarstab draws bar graphs from tabular data with error bars based upon thestandard error of proportion (σp =

                                radicpqN)

                                errorcrosses draw the confidence intervals for an x set and a y set of the same size

                                The use of the errorbarsby function allows for graphic comparisons of different groups(see Figure 6) Five personality measures are shown as a function of high versus low scoreson a ldquolierdquo scale People with higher lie scores tend to report being more agreeable consci-entious and less neurotic than people with lower lie scores The error bars are based uponnormal theory and thus are symmetric rather than reflect any skewing in the data

                                Although not recommended it is possible to use the errorbars function to draw bargraphs with associated error bars (This kind of dynamite plot (Figure 8) can be verymisleading in that the scale is arbitrary Go to a discussion of the problems in presentingdata this way at httpemdbolkerwikidotcomblogdynamite In the example shownnote that the graph starts at 0 although is out of the range This is a function of usingbars which always are assumed to start at zero Consider other ways of showing yourdata

                                344 Error bars for tabular data

                                However it is sometimes useful to show error bars for tabular data either found by thetable function or just directly input These may be found using the errorbarstab

                                function

                                19

                                gt data(epibfi)

                                gt errorbarsby(epibfi[610]epibfi$epilielt4)

                                095 confidence limits

                                Independent Variable

                                Dep

                                ende

                                nt V

                                aria

                                ble

                                bfagree bfcon bfext bfneur bfopen

                                050

                                100

                                150

                                Figure 6 Using the errorbarsby function shows that self reported personality scales onthe Big Five Inventory vary as a function of the Lie scale on the EPI The ldquocats eyesrdquo showthe distribution of the confidence

                                20

                                gt errorbarsby(satact[56]satact$genderbars=TRUE

                                + labels=c(MaleFemale)ylab=SAT scorexlab=)

                                Male Female

                                095 confidence limits

                                SAT

                                sco

                                re

                                200

                                300

                                400

                                500

                                600

                                700

                                800

                                200

                                300

                                400

                                500

                                600

                                700

                                800

                                Figure 7 A ldquoDynamite plotrdquo of SAT scores as a function of gender is one way of misleadingthe reader By using a bar graph the range of scores is ignored Bar graphs start from 0

                                21

                                gt T lt- with(satacttable(gendereducation))

                                gt rownames(T) lt- c(MF)

                                gt errorbarstab(Tway=bothylab=Proportion of Education Levelxlab=Level of Education

                                + main=Proportion of sample by education level)

                                Proportion of sample by education level

                                Level of Education

                                Pro

                                port

                                ion

                                of E

                                duca

                                tion

                                Leve

                                l

                                000

                                005

                                010

                                015

                                020

                                025

                                030

                                M 0 M 1 M 2 M 3 M 4 M 5

                                000

                                005

                                010

                                015

                                020

                                025

                                030

                                Figure 8 The proportion of each education level that is Male or Female By using theway=rdquobothrdquo option the percentages and errors are based upon the grand total Alterna-tively way=rdquocolumnsrdquo finds column wise percentages way=rdquorowsrdquo finds rowwise percent-ages The data can be converted to percentages (as shown) or by total count (raw=TRUE)The function invisibly returns the probabilities and standard errors See the help menu foran example of entering the data as a dataframe

                                22

                                345 Two dimensional displays of means and errors

                                Yet another way to display data for different conditions is to use the errorCrosses func-tion For instance the effect of various movies on both ldquoEnergetic Arousalrdquo and ldquoTenseArousalrdquo can be seen in one graph and compared to the same movie manipulations onldquoPositive Affectrdquo and ldquoNegative Affectrdquo Note how Energetic Arousal is increased by threeof the movie manipulations but that Positive Affect increases following the Happy movieonly

                                23

                                gt op lt- par(mfrow=c(12))

                                gt data(affect)

                                gt colors lt- c(blackredwhiteblue)

                                gt films lt- c(SadHorrorNeutralHappy)

                                gt affectstats lt- errorCircles(EA2TA2data=affect[-c(120)]group=Filmlabels=films

                                + xlab=Energetic Arousal ylab=Tense Arousalylim=c(1022)xlim=c(820)pch=16

                                + cex=2colors=colors main = Movies effect on arousal)gt errorCircles(PA2NA2data=affectstatslabels=filmsxlab=Positive Affect

                                + ylab=Negative Affect pch=16cex=2colors=colors main =Movies effect on affect)

                                gt op lt- par(mfrow=c(11))

                                8 12 16 20

                                1012

                                1416

                                1820

                                22

                                Movies effect on arousal

                                Energetic Arousal

                                Tens

                                e A

                                rous

                                al

                                SadHorror

                                NeutralHappy

                                6 8 10 12

                                24

                                68

                                10

                                Movies effect on affect

                                Positive Affect

                                Neg

                                ativ

                                e A

                                ffect

                                Sad

                                Horror

                                NeutralHappy

                                Figure 9 The use of the errorCircles function allows for two dimensional displays ofmeans and error bars The first call to errorCircles finds descriptive statistics for theaffect dataframe based upon the grouping variable of Film These data are returned andthen used by the second call which examines the effect of the same grouping variable upondifferent measures The size of the circles represent the relative sample sizes for each groupThe data are from the PMC lab and reported in Smillie et al (2012)

                                24

                                346 Back to back histograms

                                The bibars function summarize the characteristics of two groups (eg males and females)on a second variable (eg age) by drawing back to back histograms (see Figure 10)

                                25

                                data(bfi)gt png( bibarspng )

                                gt with(bfibibars(agegenderylab=Agemain=Age by males and females))

                                gt devoff()

                                null device

                                1

                                Figure 10 A bar plot of the age distribution for males and females shows the use ofbibars The data are males and females from 2800 cases collected using the SAPAprocedure and are available as part of the bfi data set

                                26

                                347 Correlational structure

                                There are many ways to display correlations Tabular displays are probably the mostcommon The output from the cor function in core R is a rectangular matrix lowerMat

                                will round this to (2) digits and then display as a lower off diagonal matrix lowerCor

                                calls cor with use=lsquopairwisersquo method=lsquopearsonrsquo as default values and returns (invisibly)the full correlation matrix and displays the lower off diagonal matrix

                                gt lowerCor(satact)

                                gendr edctn age ACT SATV SATQ

                                gender 100

                                education 009 100

                                age -002 055 100

                                ACT -004 015 011 100

                                SATV -002 005 -004 056 100

                                SATQ -017 003 -003 059 064 100

                                When comparing results from two different groups it is convenient to display them as onematrix with the results from one group below the diagonal and the other group above thediagonal Use lowerUpper to do this

                                gt female lt- subset(satactsatact$gender==2)

                                gt male lt- subset(satactsatact$gender==1)

                                gt lower lt- lowerCor(male[-1])

                                edctn age ACT SATV SATQ

                                education 100

                                age 061 100

                                ACT 016 015 100

                                SATV 002 -006 061 100

                                SATQ 008 004 060 068 100

                                gt upper lt- lowerCor(female[-1])

                                edctn age ACT SATV SATQ

                                education 100

                                age 052 100

                                ACT 016 008 100

                                SATV 007 -003 053 100

                                SATQ 003 -009 058 063 100

                                gt both lt- lowerUpper(lowerupper)

                                gt round(both2)

                                education age ACT SATV SATQ

                                education NA 052 016 007 003

                                age 061 NA 008 -003 -009

                                ACT 016 015 NA 053 058

                                SATV 002 -006 061 NA 063

                                SATQ 008 004 060 068 NA

                                It is also possible to compare two matrices by taking their differences and displaying one (be-low the diagonal) and the difference of the second from the first above the diagonal

                                27

                                gt diffs lt- lowerUpper(lowerupperdiff=TRUE)

                                gt round(diffs2)

                                education age ACT SATV SATQ

                                education NA 009 000 -005 005

                                age 061 NA 007 -003 013

                                ACT 016 015 NA 008 002

                                SATV 002 -006 061 NA 005

                                SATQ 008 004 060 068 NA

                                348 Heatmap displays of correlational structure

                                Perhaps a better way to see the structure in a correlation matrix is to display a heat mapof the correlations This is just a matrix color coded to represent the magnitude of thecorrelation This is useful when considering the number of factors in a data set Considerthe Thurstone data set which has a clear 3 factor solution (Figure 11) or a simulated dataset of 24 variables with a circumplex structure (Figure 12) The color coding representsa ldquoheat maprdquo of the correlations with darker shades of red representing stronger negativeand darker shades of blue stronger positive correlations As an option the value of thecorrelation can be shown

                                Yet another way to show structure is to use ldquospiderrdquo plots Particularly if variables areordered in some meaningful way (eg in a circumplex) a spider plot will show this structureeasily This is just a plot of the magnitude of the correlation as a radial line with lengthranging from 0 (for a correlation of -1) to 1 (for a correlation of 1) (See Figure 13)

                                35 Testing correlations

                                Correlations are wonderful descriptive statistics of the data but some people like to testwhether these correlations differ from zero or differ from each other The cortest func-tion (in the stats package) will test the significance of a single correlation and the rcorr

                                function in the Hmisc package will do this for many correlations In the psych packagethe corrtest function reports the correlation (Pearson Spearman or Kendall) betweenall variables in either one or two data frames or matrices as well as the number of obser-vations for each case and the (two-tailed) probability for each correlation Unfortunatelythese probability values have not been corrected for multiple comparisons and so shouldbe taken with a great deal of salt Thus in corrtest and corrp the raw probabilitiesare reported below the diagonal and the probabilities adjusted for multiple comparisonsusing (by default) the Holm correction are reported above the diagonal (Table 1) (See thepadjust function for a discussion of Holm (1979) and other corrections)

                                Testing the difference between any two correlations can be done using the rtest functionThe function actually does four different tests (based upon an article by Steiger (1980)

                                28

                                gt png(corplotpng)gt corPlot(Thurstonenumbers=TRUEupper=FALSEdiag=FALSEmain=9 cognitive variables from Thurstone)

                                gt devoff()

                                null device

                                1

                                Figure 11 The structure of correlation matrix can be seen more clearly if the variables aregrouped by factor and then the correlations are shown by color By using the rsquonumbersrsquooption the values are displayed as well By default the complete matrix is shown Settingupper=FALSE and diag=FALSE shows a cleaner figure

                                29

                                gt png(circplotpng)gt circ lt- simcirc(24)

                                gt rcirc lt- cor(circ)

                                gt corPlot(rcircmain=24 variables in a circumplex)gt devoff()

                                null device

                                1

                                Figure 12 Using the corPlot function to show the correlations in a circumplex Correlationsare highest near the diagonal diminish to zero further from the diagonal and the increaseagain towards the corners of the matrix Circumplex structures are common in the studyof affect For circumplex structures it is perhaps useful to show the complete matrix

                                30

                                gt png(spiderpng)gt oplt- par(mfrow=c(22))

                                gt spider(y=c(161218)x=124data=rcircfill=TRUEmain=Spider plot of 24 circumplex variables)

                                gt op lt- par(mfrow=c(11))

                                gt devoff()

                                null device

                                1

                                Figure 13 A spider plot can show circumplex structure very clearly Circumplex structuresare common in the study of affect

                                31

                                Table 1 The corrtest function reports correlations cell sizes and raw and adjustedprobability values corrp reports the probability values for a correlation matrix Bydefault the adjustment used is that of Holm (1979)gt corrtest(satact)

                                Callcorrtest(x = satact)

                                Correlation matrix

                                gender education age ACT SATV SATQ

                                gender 100 009 -002 -004 -002 -017

                                education 009 100 055 015 005 003

                                age -002 055 100 011 -004 -003

                                ACT -004 015 011 100 056 059

                                SATV -002 005 -004 056 100 064

                                SATQ -017 003 -003 059 064 100

                                Sample Size

                                gender education age ACT SATV SATQ

                                gender 700 700 700 700 700 687

                                education 700 700 700 700 700 687

                                age 700 700 700 700 700 687

                                ACT 700 700 700 700 700 687

                                SATV 700 700 700 700 700 687

                                SATQ 687 687 687 687 687 687

                                Probability values (Entries above the diagonal are adjusted for multiple tests)

                                gender education age ACT SATV SATQ

                                gender 000 017 100 100 1 0

                                education 002 000 000 000 1 1

                                age 058 000 000 003 1 1

                                ACT 033 000 000 000 0 0

                                SATV 062 022 026 000 0 0

                                SATQ 000 036 037 000 0 0

                                To see confidence intervals of the correlations print with the short=FALSE option

                                32

                                depending upon the input

                                1) For a sample size n find the t and p value for a single correlation as well as the confidenceinterval

                                gt rtest(503)

                                Correlation tests

                                Callrtest(n = 50 r12 = 03)

                                Test of significance of a correlation

                                t value 218 with probability lt 0034

                                and confidence interval 002 053

                                2) For sample sizes of n and n2 (n2 = n if not specified) find the z of the difference betweenthe z transformed correlations divided by the standard error of the difference of two zscores

                                gt rtest(3046)

                                Correlation tests

                                Callrtest(n = 30 r12 = 04 r34 = 06)

                                Test of difference between two independent correlations

                                z value 099 with probability 032

                                3) For sample size n and correlations ra= r12 rb= r23 and r13 specified test for thedifference of two dependent correlations (Steiger case A)

                                gt rtest(103451)

                                Correlation tests

                                Call[1] rtest(n = 103 r12 = 04 r23 = 01 r13 = 05 )

                                Test of difference between two correlated correlations

                                t value -089 with probability lt 037

                                4) For sample size n test for the difference between two dependent correlations involvingdifferent variables (Steiger case B)

                                gt rtest(103567558) steiger Case B

                                Correlation tests

                                Callrtest(n = 103 r12 = 05 r34 = 06 r23 = 07 r13 = 05 r14 = 05

                                r24 = 08)

                                Test of difference between two dependent correlations

                                z value -12 with probability 023

                                To test whether a matrix of correlations differs from what would be expected if the popu-lation correlations were all zero the function cortest follows Steiger (1980) who pointedout that the sum of the squared elements of a correlation matrix or the Fisher z scoreequivalents is distributed as chi square under the null hypothesis that the values are zero(ie elements of the identity matrix) This is particularly useful for examining whethercorrelations in a single matrix differ from zero or for comparing two matrices Althoughobvious cortest can be used to test whether the satact data matrix produces non-zerocorrelations (it does) This is a much more appropriate test when testing whether a residualmatrix differs from zero

                                gt cortest(satact)

                                33

                                Tests of correlation matrices

                                Callcortest(R1 = satact)

                                Chi Square value 132542 with df = 15 with probability lt 18e-273

                                36 Polychoric tetrachoric polyserial and biserial correlations

                                The Pearson correlation of dichotomous data is also known as the φ coefficient If thedata eg ability items are thought to represent an underlying continuous although latentvariable the φ will underestimate the value of the Pearson applied to these latent variablesOne solution to this problem is to use the tetrachoric correlation which is based uponthe assumption of a bivariate normal distribution that has been cut at certain points Thedrawtetra function demonstrates the process (Figure 14) This is also shown in termsof dichotomizing the bivariate normal density function using the drawcor function (Fig-ure 15) A simple generalization of this to the case of the multiple cuts is the polychoric

                                correlation

                                Other estimated correlations based upon the assumption of bivariate normality with cutpoints include the biserial and polyserial correlation

                                If the data are a mix of continuous polytomous and dichotomous variables the mixedcor

                                function will calculate the appropriate mixture of Pearson polychoric tetrachoric biserialand polyserial correlations

                                The correlation matrix resulting from a number of tetrachoric or polychoric correlationmatrix sometimes will not be positive semi-definite This will sometimes happen if thecorrelation matrix is formed by using pair-wise deletion of cases The corsmooth functionwill adjust the smallest eigen values of the correlation matrix to make them positive rescaleall of them to sum to the number of variables and produce aldquosmoothedrdquocorrelation matrixAn example of this problem is a data set of burt which probably had a typo in the originalcorrelation matrix Smoothing the matrix corrects this problem

                                4 Multilevel modeling

                                Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular use

                                34

                                gt drawtetra()

                                minus3 minus2 minus1 0 1 2 3

                                minus3

                                minus2

                                minus1

                                01

                                23

                                Y rho = 05phi = 033

                                X gt τY gt Τ

                                X lt τY gt Τ

                                X gt τY lt Τ

                                X lt τY lt Τ

                                x

                                dnor

                                m(x

                                )

                                X gt τ

                                τ

                                x1

                                Y gt Τ

                                Τ

                                Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values

                                35

                                gt drawcor(expand=20cuts=c(00))

                                xy

                                z

                                Bivariate density rho = 05

                                Figure 15 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values It isfound (laboriously) by optimizing the fit of the bivariate normal for various values of thecorrelation to the observed cell frequencies

                                36

                                is the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups

                                41 Decomposing data into within and between level correlations usingstatsBy

                                There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler functionto give some of the basic descriptive statistics for two level models

                                This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package

                                rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)

                                where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means

                                42 Generating and displaying multilevel data

                                withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a betweengroup correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data

                                simmultilevel will generate simulated data with a multilevel structure

                                The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary

                                function specifying the variable of interest

                                37

                                Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )

                                43 Factor analysis by groups

                                Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output

                                sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)

                                faBy(sbnfactors=5) find the 5 factor solution for each education level

                                5 Multiple Regression mediation moderation and set cor-relations

                                The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices

                                51 Multiple regression from data or correlation matrices

                                The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstonecorrelation matrix and find the multiple correlation of the last five variables as a functionof the first 4

                                gt setCor(y = 59x=14data=Thurstone)

                                Call setCor(y = 59 x = 14 data = Thurstone)

                                Multiple Regression from matrix input

                                Beta weights

                                FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

                                Sentences 009 007 025 021 020

                                Vocabulary 009 017 009 016 -002

                                SentCompletion 002 005 004 021 008

                                FirstLetters 058 045 021 008 031

                                38

                                Multiple R

                                FourLetterWords Suffixes LetterSeries Pedigrees

                                069 063 050 058

                                LetterGroup

                                048

                                multiple R2

                                FourLetterWords Suffixes LetterSeries Pedigrees

                                048 040 025 034

                                LetterGroup

                                023

                                Multiple Inflation Factor (VIF) = 1(1-SMC) =

                                Sentences Vocabulary SentCompletion FirstLetters

                                369 388 300 135

                                Unweighted multiple R

                                FourLetterWords Suffixes LetterSeries Pedigrees

                                059 058 049 058

                                LetterGroup

                                045

                                Unweighted multiple R2

                                FourLetterWords Suffixes LetterSeries Pedigrees

                                034 034 024 033

                                LetterGroup

                                020

                                Various estimates of between set correlations

                                Squared Canonical Correlations

                                [1] 06280 01478 00076 00049

                                Average squared canonical correlation = 02

                                Cohens Set Correlation R2 = 069

                                Unweighted correlation between the two sets = 073

                                By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed

                                gt sc lt- setCor(y = 59x=34data=Thurstonez=12)

                                Call setCor(y = 59 x = 34 data = Thurstone z = 12)

                                Multiple Regression from matrix input

                                Beta weights

                                FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

                                SentCompletion 002 005 004 021 008

                                FirstLetters 058 045 021 008 031

                                Multiple R

                                FourLetterWords Suffixes LetterSeries Pedigrees

                                058 046 021 018

                                LetterGroup

                                030

                                39

                                multiple R2

                                FourLetterWords Suffixes LetterSeries Pedigrees

                                0331 0210 0043 0032

                                LetterGroup

                                0092

                                Multiple Inflation Factor (VIF) = 1(1-SMC) =

                                SentCompletion FirstLetters

                                102 102

                                Unweighted multiple R

                                FourLetterWords Suffixes LetterSeries Pedigrees

                                044 035 017 014

                                LetterGroup

                                026

                                Unweighted multiple R2

                                FourLetterWords Suffixes LetterSeries Pedigrees

                                019 012 003 002

                                LetterGroup

                                007

                                Various estimates of between set correlations

                                Squared Canonical Correlations

                                [1] 0405 0023

                                Average squared canonical correlation = 021

                                Cohens Set Correlation R2 = 042

                                Unweighted correlation between the two sets = 048

                                gt round(sc$residual2)

                                FourLetterWords Suffixes LetterSeries Pedigrees

                                FourLetterWords 052 011 009 006

                                Suffixes 011 060 -001 001

                                LetterSeries 009 -001 075 028

                                Pedigrees 006 001 028 066

                                LetterGroup 013 003 037 020

                                LetterGroup

                                FourLetterWords 013

                                Suffixes 003

                                LetterSeries 037

                                Pedigrees 020

                                LetterGroup 077

                                52 Mediation and Moderation analysis

                                Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping

                                40

                                Consider the example from Preacher and Hayes (2004) as analyzed using the mediate

                                function and the subsequent graphic from mediatediagram The data are found in theexample for mediate

                                Call mediate(y = SATIS x = THERAPY m = ATTRIB data = sobel)

                                The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB

                                Total Direct effect(c) of THERAPY on SATIS = 076 SE = 031 t direct = 25 with probability = 0019

                                Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t direct = 135 with probability = 019

                                Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033

                                Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 004 Upper CI = 069

                                R2 of model = 031

                                To see the longer output specify short = FALSE in the print statement

                                Full output

                                Total effect estimates (c)

                                SATIS se t Prob

                                THERAPY 076 031 25 00186

                                Direct effect estimates (c)SATIS se t Prob

                                THERAPY 043 032 135 0190

                                ATTRIB 040 018 223 0034

                                a effect estimates

                                THERAPY se t Prob

                                ATTRIB 082 03 274 00106

                                b effect estimates

                                SATIS se t Prob

                                ATTRIB 04 018 223 0034

                                ab effect estimates

                                SATIS boot sd lower upper

                                THERAPY 033 032 017 004 069

                                bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables

                                setCor(y = c( SATV SATQ) x = c(education age ) data = satact std=TRUE)

                                bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap

                                mediate(y = c( SATV ) x = c(education age ) m= ACT data =satactstd=TRUEniter=50)

                                bull mediate will take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It then tests the mediation effect using a bootstrap The particular example is for demonstration purposes only and shows neithermoderation nor mediation The number of iterations for the boot strap was set to 50

                                41

                                gt mediatediagram(preacher)

                                Mediation model

                                THERAPY SATIS

                                ATTRIB

                                082

                                c = 076

                                c = 043

                                04

                                Figure 16 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram

                                42

                                gt preacher lt- setCor(1c(23)sobelstd=FALSE)

                                gt setCordiagram(preacher)

                                Regression Models

                                THERAPY

                                ATTRIB

                                SATIS

                                043

                                04

                                021

                                Figure 17 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure

                                43

                                for speed The default number of boot straps is 5000

                                53 Set Correlation

                                An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor

                                function Set correlation is

                                R2 = 1minusn

                                prodi=1

                                (1minusλi)

                                where λi is the ith eigen value of the eigen value decomposition of the matrix

                                R = Rminus1xx RxyRminus1

                                xx Rminus1xy

                                Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate

                                setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data

                                Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things tonotice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied

                                For this example the analysis is done on the correlation matrix rather than the rawdata

                                gt C lt- cov(satactuse=pairwise)

                                gt model1 lt- lm(ACT~ gender + education + age data=satact)

                                gt summary(model1)

                                Call

                                lm(formula = ACT ~ gender + education + age data = satact)

                                Residuals

                                44

                                Call mediate(y = c(SATQ) x = c(ACT) m = education data = satact

                                mod = gender niter = 50 std = TRUE)

                                The DV (Y) was SATQ The IV (X) was ACT gender ACTXgndr The mediating variable(s) = education

                                Total Direct effect(c) of ACT on SATQ = 058 SE = 003 t direct = 1925 with probability = 0

                                Direct effect (c) of ACT on SATQ removing education = 059 SE = 003 t direct = 1926 with probability = 0

                                Indirect effect (ab) of ACT on SATQ through education = -001

                                Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -002 Upper CI = 0

                                Total Direct effect(c) of gender on SATQ = -014 SE = 003 t direct = -478 with probability = 21e-06

                                Direct effect (c) of gender on NA removing education = -014 SE = 003 t direct = -463 with probability = 44e-06

                                Indirect effect (ab) of gender on SATQ through education = 0

                                Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -001 Upper CI = 0

                                Total Direct effect(c) of ACTXgndr on SATQ = 0 SE = 003 t direct = 002 with probability = 099

                                Direct effect (c) of ACTXgndr on NA removing education = 0 SE = 003 t direct = 001 with probability = 099

                                Indirect effect (ab) of ACTXgndr on SATQ through education = 0

                                Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = 0 Upper CI = 0

                                R2 of model = 037

                                To see the longer output specify short = FALSE in the print statement

                                Full output

                                Total effect estimates (c)

                                SATQ se t Prob

                                ACT 058 003 1925 000e+00

                                gender -014 003 -478 210e-06

                                ACTXgndr 000 003 002 985e-01

                                Direct effect estimates (c)SATQ se t Prob

                                ACT 059 003 1926 000e+00

                                gender -014 003 -463 437e-06

                                ACTXgndr 000 003 001 992e-01

                                a effect estimates

                                education se t Prob

                                ACT 016 004 422 277e-05

                                gender 009 004 250 128e-02

                                ACTXgndr -001 004 -015 883e-01

                                b effect estimates

                                SATQ se t Prob

                                education -004 003 -145 0147

                                ab effect estimates

                                SATQ boot sd lower upper

                                ACT -001 -001 001 0 0

                                gender 000 000 000 0 0

                                ACTXgndr 000 000 000 0 0

                                Moderation model

                                ACT

                                gender

                                ACTXgndr

                                SATQ

                                education016 c = 058

                                c = 059

                                009 c = minus014

                                c = minus014

                                minus001 c = 0

                                c = 0

                                minus004

                                minus004

                                minus007

                                002

                                Figure 18 Moderated multiple regression requires the raw data

                                45

                                Min 1Q Median 3Q Max

                                -252458 -32133 07769 35921 92630

                                Coefficients

                                Estimate Std Error t value Pr(gt|t|)

                                (Intercept) 2741706 082140 33378 lt 2e-16

                                gender -048606 037984 -1280 020110

                                education 047890 015235 3143 000174

                                age 001623 002278 0712 047650

                                ---

                                Signif codes 0 0001 001 005 01 1

                                Residual standard error 4768 on 696 degrees of freedom

                                Multiple R-squared 00272 Adjusted R-squared 002301

                                F-statistic 6487 on 3 and 696 DF p-value 00002476

                                Compare this with the output from setCor

                                gt compare with sector

                                gt setCor(c(46)c(13)C nobs=700)

                                Call setCor(y = c(46) x = c(13) data = C nobs = 700)

                                Multiple Regression from matrix input

                                Beta weights

                                ACT SATV SATQ

                                gender -005 -003 -018

                                education 014 010 010

                                age 003 -010 -009

                                Multiple R

                                ACT SATV SATQ

                                016 010 019

                                multiple R2

                                ACT SATV SATQ

                                00272 00096 00359

                                Multiple Inflation Factor (VIF) = 1(1-SMC) =

                                gender education age

                                101 145 144

                                Unweighted multiple R

                                ACT SATV SATQ

                                015 005 011

                                Unweighted multiple R2

                                ACT SATV SATQ

                                002 000 001

                                SE of Beta weights

                                ACT SATV SATQ

                                gender 018 429 434

                                education 022 513 518

                                age 022 511 516

                                t of Beta Weights

                                ACT SATV SATQ

                                gender -027 -001 -004

                                education 065 002 002

                                46

                                age 015 -002 -002

                                Probability of t lt

                                ACT SATV SATQ

                                gender 079 099 097

                                education 051 098 098

                                age 088 098 099

                                Shrunken R2

                                ACT SATV SATQ

                                00230 00054 00317

                                Standard Error of R2

                                ACT SATV SATQ

                                00120 00073 00137

                                F

                                ACT SATV SATQ

                                649 226 863

                                Probability of F lt

                                ACT SATV SATQ

                                248e-04 808e-02 124e-05

                                degrees of freedom of regression

                                [1] 3 696

                                Various estimates of between set correlations

                                Squared Canonical Correlations

                                [1] 0050 0033 0008

                                Chisq of canonical correlations

                                [1] 358 231 56

                                Average squared canonical correlation = 003

                                Cohens Set Correlation R2 = 009

                                Shrunken Set Correlation R2 = 008

                                F and df of Cohens Set Correlation 726 9 168186

                                Unweighted correlation between the two sets = 001

                                Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

                                6 Converting output to APA style tables using LATEX

                                Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenient

                                47

                                LATEXoutput and finally df2latex converts a generic data frame to LATEX

                                An example of converting the output from fa to LATEXappears in Table 2

                                Table 2 fa2latexA factor analysis table from the psych package in R

                                Variable MR1 MR2 MR3 h2 u2 com

                                Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

                                SS loadings 264 186 15

                                MR1 100 059 054MR2 059 100 052MR3 054 052 100

                                48

                                7 Miscellaneous functions

                                A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

                                blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

                                df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

                                scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

                                cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

                                cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

                                dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

                                fisherz Convert a correlation to the corresponding Fisher z score

                                geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

                                ICC and cohenkappa are typically used to find the reliability for raters

                                headtail combines the head and tail functions to show the first and last lines of a dataset or output

                                topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

                                mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

                                prep finds the probability of replication for an F t or r and estimate effect size

                                partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

                                rangeCorrection will correct correlations for restriction of range

                                reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

                                49

                                superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

                                8 Data sets

                                A number of data sets for demonstrating psychometric techniques are included in thepsych package These include six data sets showing a hierarchical factor structure (fivecognitive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2and one from health psychology Reise) One of these (Thurstone) is used as an examplein the sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) or 13 personality inventory scores (epibfi) and14 multiple choice iq items (iqitems) The vegetables example has paired comparisonpreferences for 9 vegetables This is an example of Thurstonian scaling used by Guilford(1954) and Nunnally (1967) Other data sets include cubits peas and heights fromGalton

                                Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

                                bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

                                satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

                                epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

                                50

                                iq 14 multiple choice ability items were included as part of the Synthetic Aperture Person-ality Assessment (SAPA) web based personality assessment project The data from1000 subjects are included here as a demonstration set for scoring multiple choiceinventories and doing basic item statistics

                                galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to ntroduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

                                Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

                                miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

                                9 Development version and a users guide

                                The most recent development version is available as a source file at the repository main-tained at httppersonality-projectorgr That version will have removed the mostrecently discovered bugs (but perhaps introduced other yet to be discovered ones) Todownload that version go to the repository httppersonality-projectorgrsrc

                                contrib and wander around For a Mac this version can be installed directly using theldquoother repositoryrdquo option in the package installer For a PC the zip file for the most recentrelease has been created using the win-builder facility at CRAN The development releasefor the Mac is usually several weeks ahead of the PC development version

                                Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httppersonality-projectorgrpsych_manualpdf

                                News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

                                gt news(Version gt 170package=psych)

                                51

                                10 Psychometric Theory

                                The psych package has been developed to help psychologists do basic research Many ofthe functions were developed to supplement a book (httppersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep)More information about the use of some of the functions may be found in the book

                                For more extensive discussion of the use of psych in particular and R in general consulthttppersonality-projectorgrrguidehtml A short guide to R

                                11 SessionInfo

                                This document was prepared using the following settings

                                gt sessionInfo()

                                R Under development (unstable) (2017-03-05 r72309)

                                Platform x86_64-apple-darwin1340 (64-bit)

                                Running under macOS Sierra 10124

                                Matrix products default

                                BLAS LibraryFrameworksRframeworkVersions34ResourcesliblibRblas0dylib

                                LAPACK LibraryFrameworksRframeworkVersions34ResourcesliblibRlapackdylib

                                locale

                                [1] C

                                attached base packages

                                [1] stats graphics grDevices utils datasets methods base

                                other attached packages

                                [1] psych_17421

                                loaded via a namespace (and not attached)

                                [1] compiler_340 parallel_340 tools_340 foreign_08-67

                                [5] KernSmooth_223-15 nlme_31-131 mnormt_15-4 grid_340

                                [9] lattice_020-34

                                52

                                References

                                Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

                                Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

                                Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

                                Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

                                Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

                                Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

                                Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

                                Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

                                Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

                                Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

                                Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

                                Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

                                Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

                                Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

                                Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

                                53

                                Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

                                Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

                                Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

                                Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

                                Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

                                Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

                                Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

                                Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

                                Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

                                Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

                                MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

                                Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

                                McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

                                Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

                                Nunnally J C (1967) Psychometric theory McGraw-Hill New York

                                54

                                Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

                                3rd edition

                                Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

                                Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

                                Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

                                Revelle W (2015) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 158

                                Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

                                Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

                                Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

                                Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

                                Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook ofIndividual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

                                Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

                                Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

                                Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

                                Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bang

                                55

                                for the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

                                Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

                                Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

                                Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

                                Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

                                Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

                                Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

                                Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

                                Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

                                Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

                                Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

                                Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

                                56

                                Index

                                affect 14 24alpha 5 6

                                Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26bifactor 6biserial 13 34blockrandom 49burt 34

                                char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49circular statistics 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49ctv 7cubits 50

                                densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49diagram 8drawcor 34drawtetra 34dummycode 13

                                dynamite plot 19

                                edit 3epibfi 50error bars 19errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23

                                fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factor analysis 6factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49

                                galton 51generalized least squares 6geometricmean 49GPArotation 7guttman 6

                                harmonicmean 49head 49headtail 49heights 50hetdiagram 7Hmisc 28Holzinger 50

                                57

                                ICC 6 49iclust 6iclustdiagram 7Index 49introduction to psychometric theory with ap-

                                plications in R 7iqitems 50irtfa 6 47irt2latex 47 49

                                KnitR 47

                                lavaan 38library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27lowess 14

                                makekeys 14MAP 6mardia 49maximum likelihood 6mediate 4 41 42mediatediagram 41minimum residual 6mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevel 37multilevelreliability 6multiple regression 38

                                nfactors 6nlme 37

                                omega 6 7outlier 3 11 12

                                padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7principal axis 6psych 3 5ndash8 28 47 49ndash52

                                R functionaffect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50

                                58

                                densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13edit 3epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49head 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47

                                irt2latex 47 49library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7psych 51psych package

                                affect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26

                                59

                                biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49corsmooth 34cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7

                                fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47irt2latex 47 49lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12prep 49pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34

                                60

                                polyserial 34principal 5ndash7psych 51rtest 28rangeCorrection 49readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

                                rtest 28

                                rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50table 19tail 49tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

                                R package

                                61

                                ctv 7GPArotation 7Hmisc 28KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash8 28 47 49ndash52Rgraphviz 8sem 7 50stats 28Sweave 47xtable 47

                                rtest 28rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49Rgraphviz 8

                                SAPA 26 50 51satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43sem 7 50set correlation 44setcor 49setCor 4 38 41 44 46 47simmultilevel 37

                                spider 13stars 14stats 28StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50Sweave 47

                                table 19tail 49tetrachoric 6 34Thurstone 28 38 50Thurstone33 50topBottom 49

                                vegetables 50 51violinBy 14 18vss 5 6

                                weighted least squares 6withinBetween 37

                                xtable 47

                                62

                                • Jump starting the psych packagendasha guide for the impatient
                                • Psychometric functions are summarized in the second vignette
                                • Overview of this and related documents
                                • Getting started
                                • Basic data analysis
                                  • Getting the data by using readfile
                                  • Data input from the clipboard
                                  • Basic descriptive statistics
                                    • Outlier detection using outlier
                                    • Basic data cleaning using scrub
                                    • Recoding categorical variables into dummy coded variables
                                      • Simple descriptive graphics
                                        • Scatter Plot Matrices
                                        • Density or violin plots
                                        • Means and error bars
                                        • Error bars for tabular data
                                        • Two dimensional displays of means and errors
                                        • Back to back histograms
                                        • Correlational structure
                                        • Heatmap displays of correlational structure
                                          • Testing correlations
                                          • Polychoric tetrachoric polyserial and biserial correlations
                                            • Multilevel modeling
                                              • Decomposing data into within and between level correlations using statsBy
                                              • Generating and displaying multilevel data
                                              • Factor analysis by groups
                                                • Multiple Regression mediation moderation and set correlations
                                                  • Multiple regression from data or correlation matrices
                                                  • Mediation and Moderation analysis
                                                  • Set Correlation
                                                    • Converting output to APA style tables using LaTeX
                                                    • Miscellaneous functions
                                                    • Data sets
                                                    • Development version and a users guide
                                                    • Psychometric Theory
                                                    • SessionInfo

                                  gt keys lt- makekeys(msq[175]list(

                                  + EA = c(active energetic vigorous wakeful wideawake fullofpep

                                  + lively -sleepy -tired -drowsy)

                                  + TA =c(intense jittery fearful tense clutchedup -quiet -still

                                  + -placid -calm -atrest)

                                  + PA =c(active excited strong inspired determined attentive

                                  + interested enthusiastic proud alert)

                                  + NAf =c(jittery nervous scared afraid guilty ashamed distressed

                                  + upset hostile irritable )) )

                                  gt scores lt- scoreItems(keysmsq[175])

                                  gt png(msqpng)gt pairspanels(scores$scoressmoother=TRUE

                                  + main =Density distributions of four measures of affect )

                                  gt devoff()

                                  null device

                                  1

                                  Figure 4 Using the pairspanels function to graphically show relationships The x axis ineach scatter plot represents the column variable the y axis the row variable The variablesare four measures of motivational state for 3896 participants Each scale is the averagescore of 10 items measuring motivational state Compare this a plot with smoother set toFALSE

                                  17

                                  gt data(satact)

                                  gt violinBy(satact[56]satact$gendergrpname=c(M F)main=Density Plot by gender for SAT V and Q)

                                  Density Plot by gender for SAT V and Q

                                  Obs

                                  erve

                                  d

                                  SATV M SATV F SATQ M SATQ F

                                  200

                                  300

                                  400

                                  500

                                  600

                                  700

                                  800

                                  Figure 5 Using the violinBy function to show the distribution of SAT V and Q for malesand females The plot shows the medians and 25th and 75th percentiles as well as theentire range and the density distribution

                                  18

                                  343 Means and error bars

                                  Additional descriptive graphics include the ability to draw error bars on sets of data aswell as to draw error bars in both the x and y directions for paired data These are thefunctions errorbars errorbarsby errorbarstab and errorcrosses

                                  errorbars show the 95 confidence intervals for each variable in a data frame or ma-trix These errors are based upon normal theory and the standard errors of the meanAlternative options include +- one standard deviation or 1 standard error If thedata are repeated measures the error bars will be reflect the between variable cor-relations By default the confidence intervals are displayed using a ldquocats eyesrdquo plotwhich emphasizes the distribution of confidence within the confidence interval

                                  errorbarsby does the same but grouping the data by some condition

                                  errorbarstab draws bar graphs from tabular data with error bars based upon thestandard error of proportion (σp =

                                  radicpqN)

                                  errorcrosses draw the confidence intervals for an x set and a y set of the same size

                                  The use of the errorbarsby function allows for graphic comparisons of different groups(see Figure 6) Five personality measures are shown as a function of high versus low scoreson a ldquolierdquo scale People with higher lie scores tend to report being more agreeable consci-entious and less neurotic than people with lower lie scores The error bars are based uponnormal theory and thus are symmetric rather than reflect any skewing in the data

                                  Although not recommended it is possible to use the errorbars function to draw bargraphs with associated error bars (This kind of dynamite plot (Figure 8) can be verymisleading in that the scale is arbitrary Go to a discussion of the problems in presentingdata this way at httpemdbolkerwikidotcomblogdynamite In the example shownnote that the graph starts at 0 although is out of the range This is a function of usingbars which always are assumed to start at zero Consider other ways of showing yourdata

                                  344 Error bars for tabular data

                                  However it is sometimes useful to show error bars for tabular data either found by thetable function or just directly input These may be found using the errorbarstab

                                  function

                                  19

                                  gt data(epibfi)

                                  gt errorbarsby(epibfi[610]epibfi$epilielt4)

                                  095 confidence limits

                                  Independent Variable

                                  Dep

                                  ende

                                  nt V

                                  aria

                                  ble

                                  bfagree bfcon bfext bfneur bfopen

                                  050

                                  100

                                  150

                                  Figure 6 Using the errorbarsby function shows that self reported personality scales onthe Big Five Inventory vary as a function of the Lie scale on the EPI The ldquocats eyesrdquo showthe distribution of the confidence

                                  20

                                  gt errorbarsby(satact[56]satact$genderbars=TRUE

                                  + labels=c(MaleFemale)ylab=SAT scorexlab=)

                                  Male Female

                                  095 confidence limits

                                  SAT

                                  sco

                                  re

                                  200

                                  300

                                  400

                                  500

                                  600

                                  700

                                  800

                                  200

                                  300

                                  400

                                  500

                                  600

                                  700

                                  800

                                  Figure 7 A ldquoDynamite plotrdquo of SAT scores as a function of gender is one way of misleadingthe reader By using a bar graph the range of scores is ignored Bar graphs start from 0

                                  21

                                  gt T lt- with(satacttable(gendereducation))

                                  gt rownames(T) lt- c(MF)

                                  gt errorbarstab(Tway=bothylab=Proportion of Education Levelxlab=Level of Education

                                  + main=Proportion of sample by education level)

                                  Proportion of sample by education level

                                  Level of Education

                                  Pro

                                  port

                                  ion

                                  of E

                                  duca

                                  tion

                                  Leve

                                  l

                                  000

                                  005

                                  010

                                  015

                                  020

                                  025

                                  030

                                  M 0 M 1 M 2 M 3 M 4 M 5

                                  000

                                  005

                                  010

                                  015

                                  020

                                  025

                                  030

                                  Figure 8 The proportion of each education level that is Male or Female By using theway=rdquobothrdquo option the percentages and errors are based upon the grand total Alterna-tively way=rdquocolumnsrdquo finds column wise percentages way=rdquorowsrdquo finds rowwise percent-ages The data can be converted to percentages (as shown) or by total count (raw=TRUE)The function invisibly returns the probabilities and standard errors See the help menu foran example of entering the data as a dataframe

                                  22

                                  345 Two dimensional displays of means and errors

                                  Yet another way to display data for different conditions is to use the errorCrosses func-tion For instance the effect of various movies on both ldquoEnergetic Arousalrdquo and ldquoTenseArousalrdquo can be seen in one graph and compared to the same movie manipulations onldquoPositive Affectrdquo and ldquoNegative Affectrdquo Note how Energetic Arousal is increased by threeof the movie manipulations but that Positive Affect increases following the Happy movieonly

                                  23

                                  gt op lt- par(mfrow=c(12))

                                  gt data(affect)

                                  gt colors lt- c(blackredwhiteblue)

                                  gt films lt- c(SadHorrorNeutralHappy)

                                  gt affectstats lt- errorCircles(EA2TA2data=affect[-c(120)]group=Filmlabels=films

                                  + xlab=Energetic Arousal ylab=Tense Arousalylim=c(1022)xlim=c(820)pch=16

                                  + cex=2colors=colors main = Movies effect on arousal)gt errorCircles(PA2NA2data=affectstatslabels=filmsxlab=Positive Affect

                                  + ylab=Negative Affect pch=16cex=2colors=colors main =Movies effect on affect)

                                  gt op lt- par(mfrow=c(11))

                                  8 12 16 20

                                  1012

                                  1416

                                  1820

                                  22

                                  Movies effect on arousal

                                  Energetic Arousal

                                  Tens

                                  e A

                                  rous

                                  al

                                  SadHorror

                                  NeutralHappy

                                  6 8 10 12

                                  24

                                  68

                                  10

                                  Movies effect on affect

                                  Positive Affect

                                  Neg

                                  ativ

                                  e A

                                  ffect

                                  Sad

                                  Horror

                                  NeutralHappy

                                  Figure 9 The use of the errorCircles function allows for two dimensional displays ofmeans and error bars The first call to errorCircles finds descriptive statistics for theaffect dataframe based upon the grouping variable of Film These data are returned andthen used by the second call which examines the effect of the same grouping variable upondifferent measures The size of the circles represent the relative sample sizes for each groupThe data are from the PMC lab and reported in Smillie et al (2012)

                                  24

                                  346 Back to back histograms

                                  The bibars function summarize the characteristics of two groups (eg males and females)on a second variable (eg age) by drawing back to back histograms (see Figure 10)

                                  25

                                  data(bfi)gt png( bibarspng )

                                  gt with(bfibibars(agegenderylab=Agemain=Age by males and females))

                                  gt devoff()

                                  null device

                                  1

                                  Figure 10 A bar plot of the age distribution for males and females shows the use ofbibars The data are males and females from 2800 cases collected using the SAPAprocedure and are available as part of the bfi data set

                                  26

                                  347 Correlational structure

                                  There are many ways to display correlations Tabular displays are probably the mostcommon The output from the cor function in core R is a rectangular matrix lowerMat

                                  will round this to (2) digits and then display as a lower off diagonal matrix lowerCor

                                  calls cor with use=lsquopairwisersquo method=lsquopearsonrsquo as default values and returns (invisibly)the full correlation matrix and displays the lower off diagonal matrix

                                  gt lowerCor(satact)

                                  gendr edctn age ACT SATV SATQ

                                  gender 100

                                  education 009 100

                                  age -002 055 100

                                  ACT -004 015 011 100

                                  SATV -002 005 -004 056 100

                                  SATQ -017 003 -003 059 064 100

                                  When comparing results from two different groups it is convenient to display them as onematrix with the results from one group below the diagonal and the other group above thediagonal Use lowerUpper to do this

                                  gt female lt- subset(satactsatact$gender==2)

                                  gt male lt- subset(satactsatact$gender==1)

                                  gt lower lt- lowerCor(male[-1])

                                  edctn age ACT SATV SATQ

                                  education 100

                                  age 061 100

                                  ACT 016 015 100

                                  SATV 002 -006 061 100

                                  SATQ 008 004 060 068 100

                                  gt upper lt- lowerCor(female[-1])

                                  edctn age ACT SATV SATQ

                                  education 100

                                  age 052 100

                                  ACT 016 008 100

                                  SATV 007 -003 053 100

                                  SATQ 003 -009 058 063 100

                                  gt both lt- lowerUpper(lowerupper)

                                  gt round(both2)

                                  education age ACT SATV SATQ

                                  education NA 052 016 007 003

                                  age 061 NA 008 -003 -009

                                  ACT 016 015 NA 053 058

                                  SATV 002 -006 061 NA 063

                                  SATQ 008 004 060 068 NA

                                  It is also possible to compare two matrices by taking their differences and displaying one (be-low the diagonal) and the difference of the second from the first above the diagonal

                                  27

                                  gt diffs lt- lowerUpper(lowerupperdiff=TRUE)

                                  gt round(diffs2)

                                  education age ACT SATV SATQ

                                  education NA 009 000 -005 005

                                  age 061 NA 007 -003 013

                                  ACT 016 015 NA 008 002

                                  SATV 002 -006 061 NA 005

                                  SATQ 008 004 060 068 NA

                                  348 Heatmap displays of correlational structure

                                  Perhaps a better way to see the structure in a correlation matrix is to display a heat mapof the correlations This is just a matrix color coded to represent the magnitude of thecorrelation This is useful when considering the number of factors in a data set Considerthe Thurstone data set which has a clear 3 factor solution (Figure 11) or a simulated dataset of 24 variables with a circumplex structure (Figure 12) The color coding representsa ldquoheat maprdquo of the correlations with darker shades of red representing stronger negativeand darker shades of blue stronger positive correlations As an option the value of thecorrelation can be shown

                                  Yet another way to show structure is to use ldquospiderrdquo plots Particularly if variables areordered in some meaningful way (eg in a circumplex) a spider plot will show this structureeasily This is just a plot of the magnitude of the correlation as a radial line with lengthranging from 0 (for a correlation of -1) to 1 (for a correlation of 1) (See Figure 13)

                                  35 Testing correlations

                                  Correlations are wonderful descriptive statistics of the data but some people like to testwhether these correlations differ from zero or differ from each other The cortest func-tion (in the stats package) will test the significance of a single correlation and the rcorr

                                  function in the Hmisc package will do this for many correlations In the psych packagethe corrtest function reports the correlation (Pearson Spearman or Kendall) betweenall variables in either one or two data frames or matrices as well as the number of obser-vations for each case and the (two-tailed) probability for each correlation Unfortunatelythese probability values have not been corrected for multiple comparisons and so shouldbe taken with a great deal of salt Thus in corrtest and corrp the raw probabilitiesare reported below the diagonal and the probabilities adjusted for multiple comparisonsusing (by default) the Holm correction are reported above the diagonal (Table 1) (See thepadjust function for a discussion of Holm (1979) and other corrections)

                                  Testing the difference between any two correlations can be done using the rtest functionThe function actually does four different tests (based upon an article by Steiger (1980)

                                  28

                                  gt png(corplotpng)gt corPlot(Thurstonenumbers=TRUEupper=FALSEdiag=FALSEmain=9 cognitive variables from Thurstone)

                                  gt devoff()

                                  null device

                                  1

                                  Figure 11 The structure of correlation matrix can be seen more clearly if the variables aregrouped by factor and then the correlations are shown by color By using the rsquonumbersrsquooption the values are displayed as well By default the complete matrix is shown Settingupper=FALSE and diag=FALSE shows a cleaner figure

                                  29

                                  gt png(circplotpng)gt circ lt- simcirc(24)

                                  gt rcirc lt- cor(circ)

                                  gt corPlot(rcircmain=24 variables in a circumplex)gt devoff()

                                  null device

                                  1

                                  Figure 12 Using the corPlot function to show the correlations in a circumplex Correlationsare highest near the diagonal diminish to zero further from the diagonal and the increaseagain towards the corners of the matrix Circumplex structures are common in the studyof affect For circumplex structures it is perhaps useful to show the complete matrix

                                  30

                                  gt png(spiderpng)gt oplt- par(mfrow=c(22))

                                  gt spider(y=c(161218)x=124data=rcircfill=TRUEmain=Spider plot of 24 circumplex variables)

                                  gt op lt- par(mfrow=c(11))

                                  gt devoff()

                                  null device

                                  1

                                  Figure 13 A spider plot can show circumplex structure very clearly Circumplex structuresare common in the study of affect

                                  31

                                  Table 1 The corrtest function reports correlations cell sizes and raw and adjustedprobability values corrp reports the probability values for a correlation matrix Bydefault the adjustment used is that of Holm (1979)gt corrtest(satact)

                                  Callcorrtest(x = satact)

                                  Correlation matrix

                                  gender education age ACT SATV SATQ

                                  gender 100 009 -002 -004 -002 -017

                                  education 009 100 055 015 005 003

                                  age -002 055 100 011 -004 -003

                                  ACT -004 015 011 100 056 059

                                  SATV -002 005 -004 056 100 064

                                  SATQ -017 003 -003 059 064 100

                                  Sample Size

                                  gender education age ACT SATV SATQ

                                  gender 700 700 700 700 700 687

                                  education 700 700 700 700 700 687

                                  age 700 700 700 700 700 687

                                  ACT 700 700 700 700 700 687

                                  SATV 700 700 700 700 700 687

                                  SATQ 687 687 687 687 687 687

                                  Probability values (Entries above the diagonal are adjusted for multiple tests)

                                  gender education age ACT SATV SATQ

                                  gender 000 017 100 100 1 0

                                  education 002 000 000 000 1 1

                                  age 058 000 000 003 1 1

                                  ACT 033 000 000 000 0 0

                                  SATV 062 022 026 000 0 0

                                  SATQ 000 036 037 000 0 0

                                  To see confidence intervals of the correlations print with the short=FALSE option

                                  32

                                  depending upon the input

                                  1) For a sample size n find the t and p value for a single correlation as well as the confidenceinterval

                                  gt rtest(503)

                                  Correlation tests

                                  Callrtest(n = 50 r12 = 03)

                                  Test of significance of a correlation

                                  t value 218 with probability lt 0034

                                  and confidence interval 002 053

                                  2) For sample sizes of n and n2 (n2 = n if not specified) find the z of the difference betweenthe z transformed correlations divided by the standard error of the difference of two zscores

                                  gt rtest(3046)

                                  Correlation tests

                                  Callrtest(n = 30 r12 = 04 r34 = 06)

                                  Test of difference between two independent correlations

                                  z value 099 with probability 032

                                  3) For sample size n and correlations ra= r12 rb= r23 and r13 specified test for thedifference of two dependent correlations (Steiger case A)

                                  gt rtest(103451)

                                  Correlation tests

                                  Call[1] rtest(n = 103 r12 = 04 r23 = 01 r13 = 05 )

                                  Test of difference between two correlated correlations

                                  t value -089 with probability lt 037

                                  4) For sample size n test for the difference between two dependent correlations involvingdifferent variables (Steiger case B)

                                  gt rtest(103567558) steiger Case B

                                  Correlation tests

                                  Callrtest(n = 103 r12 = 05 r34 = 06 r23 = 07 r13 = 05 r14 = 05

                                  r24 = 08)

                                  Test of difference between two dependent correlations

                                  z value -12 with probability 023

                                  To test whether a matrix of correlations differs from what would be expected if the popu-lation correlations were all zero the function cortest follows Steiger (1980) who pointedout that the sum of the squared elements of a correlation matrix or the Fisher z scoreequivalents is distributed as chi square under the null hypothesis that the values are zero(ie elements of the identity matrix) This is particularly useful for examining whethercorrelations in a single matrix differ from zero or for comparing two matrices Althoughobvious cortest can be used to test whether the satact data matrix produces non-zerocorrelations (it does) This is a much more appropriate test when testing whether a residualmatrix differs from zero

                                  gt cortest(satact)

                                  33

                                  Tests of correlation matrices

                                  Callcortest(R1 = satact)

                                  Chi Square value 132542 with df = 15 with probability lt 18e-273

                                  36 Polychoric tetrachoric polyserial and biserial correlations

                                  The Pearson correlation of dichotomous data is also known as the φ coefficient If thedata eg ability items are thought to represent an underlying continuous although latentvariable the φ will underestimate the value of the Pearson applied to these latent variablesOne solution to this problem is to use the tetrachoric correlation which is based uponthe assumption of a bivariate normal distribution that has been cut at certain points Thedrawtetra function demonstrates the process (Figure 14) This is also shown in termsof dichotomizing the bivariate normal density function using the drawcor function (Fig-ure 15) A simple generalization of this to the case of the multiple cuts is the polychoric

                                  correlation

                                  Other estimated correlations based upon the assumption of bivariate normality with cutpoints include the biserial and polyserial correlation

                                  If the data are a mix of continuous polytomous and dichotomous variables the mixedcor

                                  function will calculate the appropriate mixture of Pearson polychoric tetrachoric biserialand polyserial correlations

                                  The correlation matrix resulting from a number of tetrachoric or polychoric correlationmatrix sometimes will not be positive semi-definite This will sometimes happen if thecorrelation matrix is formed by using pair-wise deletion of cases The corsmooth functionwill adjust the smallest eigen values of the correlation matrix to make them positive rescaleall of them to sum to the number of variables and produce aldquosmoothedrdquocorrelation matrixAn example of this problem is a data set of burt which probably had a typo in the originalcorrelation matrix Smoothing the matrix corrects this problem

                                  4 Multilevel modeling

                                  Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular use

                                  34

                                  gt drawtetra()

                                  minus3 minus2 minus1 0 1 2 3

                                  minus3

                                  minus2

                                  minus1

                                  01

                                  23

                                  Y rho = 05phi = 033

                                  X gt τY gt Τ

                                  X lt τY gt Τ

                                  X gt τY lt Τ

                                  X lt τY lt Τ

                                  x

                                  dnor

                                  m(x

                                  )

                                  X gt τ

                                  τ

                                  x1

                                  Y gt Τ

                                  Τ

                                  Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values

                                  35

                                  gt drawcor(expand=20cuts=c(00))

                                  xy

                                  z

                                  Bivariate density rho = 05

                                  Figure 15 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values It isfound (laboriously) by optimizing the fit of the bivariate normal for various values of thecorrelation to the observed cell frequencies

                                  36

                                  is the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups

                                  41 Decomposing data into within and between level correlations usingstatsBy

                                  There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler functionto give some of the basic descriptive statistics for two level models

                                  This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package

                                  rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)

                                  where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means

                                  42 Generating and displaying multilevel data

                                  withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a betweengroup correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data

                                  simmultilevel will generate simulated data with a multilevel structure

                                  The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary

                                  function specifying the variable of interest

                                  37

                                  Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )

                                  43 Factor analysis by groups

                                  Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output

                                  sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)

                                  faBy(sbnfactors=5) find the 5 factor solution for each education level

                                  5 Multiple Regression mediation moderation and set cor-relations

                                  The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices

                                  51 Multiple regression from data or correlation matrices

                                  The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstonecorrelation matrix and find the multiple correlation of the last five variables as a functionof the first 4

                                  gt setCor(y = 59x=14data=Thurstone)

                                  Call setCor(y = 59 x = 14 data = Thurstone)

                                  Multiple Regression from matrix input

                                  Beta weights

                                  FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

                                  Sentences 009 007 025 021 020

                                  Vocabulary 009 017 009 016 -002

                                  SentCompletion 002 005 004 021 008

                                  FirstLetters 058 045 021 008 031

                                  38

                                  Multiple R

                                  FourLetterWords Suffixes LetterSeries Pedigrees

                                  069 063 050 058

                                  LetterGroup

                                  048

                                  multiple R2

                                  FourLetterWords Suffixes LetterSeries Pedigrees

                                  048 040 025 034

                                  LetterGroup

                                  023

                                  Multiple Inflation Factor (VIF) = 1(1-SMC) =

                                  Sentences Vocabulary SentCompletion FirstLetters

                                  369 388 300 135

                                  Unweighted multiple R

                                  FourLetterWords Suffixes LetterSeries Pedigrees

                                  059 058 049 058

                                  LetterGroup

                                  045

                                  Unweighted multiple R2

                                  FourLetterWords Suffixes LetterSeries Pedigrees

                                  034 034 024 033

                                  LetterGroup

                                  020

                                  Various estimates of between set correlations

                                  Squared Canonical Correlations

                                  [1] 06280 01478 00076 00049

                                  Average squared canonical correlation = 02

                                  Cohens Set Correlation R2 = 069

                                  Unweighted correlation between the two sets = 073

                                  By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed

                                  gt sc lt- setCor(y = 59x=34data=Thurstonez=12)

                                  Call setCor(y = 59 x = 34 data = Thurstone z = 12)

                                  Multiple Regression from matrix input

                                  Beta weights

                                  FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

                                  SentCompletion 002 005 004 021 008

                                  FirstLetters 058 045 021 008 031

                                  Multiple R

                                  FourLetterWords Suffixes LetterSeries Pedigrees

                                  058 046 021 018

                                  LetterGroup

                                  030

                                  39

                                  multiple R2

                                  FourLetterWords Suffixes LetterSeries Pedigrees

                                  0331 0210 0043 0032

                                  LetterGroup

                                  0092

                                  Multiple Inflation Factor (VIF) = 1(1-SMC) =

                                  SentCompletion FirstLetters

                                  102 102

                                  Unweighted multiple R

                                  FourLetterWords Suffixes LetterSeries Pedigrees

                                  044 035 017 014

                                  LetterGroup

                                  026

                                  Unweighted multiple R2

                                  FourLetterWords Suffixes LetterSeries Pedigrees

                                  019 012 003 002

                                  LetterGroup

                                  007

                                  Various estimates of between set correlations

                                  Squared Canonical Correlations

                                  [1] 0405 0023

                                  Average squared canonical correlation = 021

                                  Cohens Set Correlation R2 = 042

                                  Unweighted correlation between the two sets = 048

                                  gt round(sc$residual2)

                                  FourLetterWords Suffixes LetterSeries Pedigrees

                                  FourLetterWords 052 011 009 006

                                  Suffixes 011 060 -001 001

                                  LetterSeries 009 -001 075 028

                                  Pedigrees 006 001 028 066

                                  LetterGroup 013 003 037 020

                                  LetterGroup

                                  FourLetterWords 013

                                  Suffixes 003

                                  LetterSeries 037

                                  Pedigrees 020

                                  LetterGroup 077

                                  52 Mediation and Moderation analysis

                                  Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping

                                  40

                                  Consider the example from Preacher and Hayes (2004) as analyzed using the mediate

                                  function and the subsequent graphic from mediatediagram The data are found in theexample for mediate

                                  Call mediate(y = SATIS x = THERAPY m = ATTRIB data = sobel)

                                  The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB

                                  Total Direct effect(c) of THERAPY on SATIS = 076 SE = 031 t direct = 25 with probability = 0019

                                  Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t direct = 135 with probability = 019

                                  Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033

                                  Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 004 Upper CI = 069

                                  R2 of model = 031

                                  To see the longer output specify short = FALSE in the print statement

                                  Full output

                                  Total effect estimates (c)

                                  SATIS se t Prob

                                  THERAPY 076 031 25 00186

                                  Direct effect estimates (c)SATIS se t Prob

                                  THERAPY 043 032 135 0190

                                  ATTRIB 040 018 223 0034

                                  a effect estimates

                                  THERAPY se t Prob

                                  ATTRIB 082 03 274 00106

                                  b effect estimates

                                  SATIS se t Prob

                                  ATTRIB 04 018 223 0034

                                  ab effect estimates

                                  SATIS boot sd lower upper

                                  THERAPY 033 032 017 004 069

                                  bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables

                                  setCor(y = c( SATV SATQ) x = c(education age ) data = satact std=TRUE)

                                  bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap

                                  mediate(y = c( SATV ) x = c(education age ) m= ACT data =satactstd=TRUEniter=50)

                                  bull mediate will take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It then tests the mediation effect using a bootstrap The particular example is for demonstration purposes only and shows neithermoderation nor mediation The number of iterations for the boot strap was set to 50

                                  41

                                  gt mediatediagram(preacher)

                                  Mediation model

                                  THERAPY SATIS

                                  ATTRIB

                                  082

                                  c = 076

                                  c = 043

                                  04

                                  Figure 16 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram

                                  42

                                  gt preacher lt- setCor(1c(23)sobelstd=FALSE)

                                  gt setCordiagram(preacher)

                                  Regression Models

                                  THERAPY

                                  ATTRIB

                                  SATIS

                                  043

                                  04

                                  021

                                  Figure 17 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure

                                  43

                                  for speed The default number of boot straps is 5000

                                  53 Set Correlation

                                  An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor

                                  function Set correlation is

                                  R2 = 1minusn

                                  prodi=1

                                  (1minusλi)

                                  where λi is the ith eigen value of the eigen value decomposition of the matrix

                                  R = Rminus1xx RxyRminus1

                                  xx Rminus1xy

                                  Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate

                                  setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data

                                  Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things tonotice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied

                                  For this example the analysis is done on the correlation matrix rather than the rawdata

                                  gt C lt- cov(satactuse=pairwise)

                                  gt model1 lt- lm(ACT~ gender + education + age data=satact)

                                  gt summary(model1)

                                  Call

                                  lm(formula = ACT ~ gender + education + age data = satact)

                                  Residuals

                                  44

                                  Call mediate(y = c(SATQ) x = c(ACT) m = education data = satact

                                  mod = gender niter = 50 std = TRUE)

                                  The DV (Y) was SATQ The IV (X) was ACT gender ACTXgndr The mediating variable(s) = education

                                  Total Direct effect(c) of ACT on SATQ = 058 SE = 003 t direct = 1925 with probability = 0

                                  Direct effect (c) of ACT on SATQ removing education = 059 SE = 003 t direct = 1926 with probability = 0

                                  Indirect effect (ab) of ACT on SATQ through education = -001

                                  Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -002 Upper CI = 0

                                  Total Direct effect(c) of gender on SATQ = -014 SE = 003 t direct = -478 with probability = 21e-06

                                  Direct effect (c) of gender on NA removing education = -014 SE = 003 t direct = -463 with probability = 44e-06

                                  Indirect effect (ab) of gender on SATQ through education = 0

                                  Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -001 Upper CI = 0

                                  Total Direct effect(c) of ACTXgndr on SATQ = 0 SE = 003 t direct = 002 with probability = 099

                                  Direct effect (c) of ACTXgndr on NA removing education = 0 SE = 003 t direct = 001 with probability = 099

                                  Indirect effect (ab) of ACTXgndr on SATQ through education = 0

                                  Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = 0 Upper CI = 0

                                  R2 of model = 037

                                  To see the longer output specify short = FALSE in the print statement

                                  Full output

                                  Total effect estimates (c)

                                  SATQ se t Prob

                                  ACT 058 003 1925 000e+00

                                  gender -014 003 -478 210e-06

                                  ACTXgndr 000 003 002 985e-01

                                  Direct effect estimates (c)SATQ se t Prob

                                  ACT 059 003 1926 000e+00

                                  gender -014 003 -463 437e-06

                                  ACTXgndr 000 003 001 992e-01

                                  a effect estimates

                                  education se t Prob

                                  ACT 016 004 422 277e-05

                                  gender 009 004 250 128e-02

                                  ACTXgndr -001 004 -015 883e-01

                                  b effect estimates

                                  SATQ se t Prob

                                  education -004 003 -145 0147

                                  ab effect estimates

                                  SATQ boot sd lower upper

                                  ACT -001 -001 001 0 0

                                  gender 000 000 000 0 0

                                  ACTXgndr 000 000 000 0 0

                                  Moderation model

                                  ACT

                                  gender

                                  ACTXgndr

                                  SATQ

                                  education016 c = 058

                                  c = 059

                                  009 c = minus014

                                  c = minus014

                                  minus001 c = 0

                                  c = 0

                                  minus004

                                  minus004

                                  minus007

                                  002

                                  Figure 18 Moderated multiple regression requires the raw data

                                  45

                                  Min 1Q Median 3Q Max

                                  -252458 -32133 07769 35921 92630

                                  Coefficients

                                  Estimate Std Error t value Pr(gt|t|)

                                  (Intercept) 2741706 082140 33378 lt 2e-16

                                  gender -048606 037984 -1280 020110

                                  education 047890 015235 3143 000174

                                  age 001623 002278 0712 047650

                                  ---

                                  Signif codes 0 0001 001 005 01 1

                                  Residual standard error 4768 on 696 degrees of freedom

                                  Multiple R-squared 00272 Adjusted R-squared 002301

                                  F-statistic 6487 on 3 and 696 DF p-value 00002476

                                  Compare this with the output from setCor

                                  gt compare with sector

                                  gt setCor(c(46)c(13)C nobs=700)

                                  Call setCor(y = c(46) x = c(13) data = C nobs = 700)

                                  Multiple Regression from matrix input

                                  Beta weights

                                  ACT SATV SATQ

                                  gender -005 -003 -018

                                  education 014 010 010

                                  age 003 -010 -009

                                  Multiple R

                                  ACT SATV SATQ

                                  016 010 019

                                  multiple R2

                                  ACT SATV SATQ

                                  00272 00096 00359

                                  Multiple Inflation Factor (VIF) = 1(1-SMC) =

                                  gender education age

                                  101 145 144

                                  Unweighted multiple R

                                  ACT SATV SATQ

                                  015 005 011

                                  Unweighted multiple R2

                                  ACT SATV SATQ

                                  002 000 001

                                  SE of Beta weights

                                  ACT SATV SATQ

                                  gender 018 429 434

                                  education 022 513 518

                                  age 022 511 516

                                  t of Beta Weights

                                  ACT SATV SATQ

                                  gender -027 -001 -004

                                  education 065 002 002

                                  46

                                  age 015 -002 -002

                                  Probability of t lt

                                  ACT SATV SATQ

                                  gender 079 099 097

                                  education 051 098 098

                                  age 088 098 099

                                  Shrunken R2

                                  ACT SATV SATQ

                                  00230 00054 00317

                                  Standard Error of R2

                                  ACT SATV SATQ

                                  00120 00073 00137

                                  F

                                  ACT SATV SATQ

                                  649 226 863

                                  Probability of F lt

                                  ACT SATV SATQ

                                  248e-04 808e-02 124e-05

                                  degrees of freedom of regression

                                  [1] 3 696

                                  Various estimates of between set correlations

                                  Squared Canonical Correlations

                                  [1] 0050 0033 0008

                                  Chisq of canonical correlations

                                  [1] 358 231 56

                                  Average squared canonical correlation = 003

                                  Cohens Set Correlation R2 = 009

                                  Shrunken Set Correlation R2 = 008

                                  F and df of Cohens Set Correlation 726 9 168186

                                  Unweighted correlation between the two sets = 001

                                  Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

                                  6 Converting output to APA style tables using LATEX

                                  Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenient

                                  47

                                  LATEXoutput and finally df2latex converts a generic data frame to LATEX

                                  An example of converting the output from fa to LATEXappears in Table 2

                                  Table 2 fa2latexA factor analysis table from the psych package in R

                                  Variable MR1 MR2 MR3 h2 u2 com

                                  Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

                                  SS loadings 264 186 15

                                  MR1 100 059 054MR2 059 100 052MR3 054 052 100

                                  48

                                  7 Miscellaneous functions

                                  A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

                                  blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

                                  df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

                                  scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

                                  cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

                                  cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

                                  dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

                                  fisherz Convert a correlation to the corresponding Fisher z score

                                  geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

                                  ICC and cohenkappa are typically used to find the reliability for raters

                                  headtail combines the head and tail functions to show the first and last lines of a dataset or output

                                  topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

                                  mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

                                  prep finds the probability of replication for an F t or r and estimate effect size

                                  partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

                                  rangeCorrection will correct correlations for restriction of range

                                  reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

                                  49

                                  superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

                                  8 Data sets

                                  A number of data sets for demonstrating psychometric techniques are included in thepsych package These include six data sets showing a hierarchical factor structure (fivecognitive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2and one from health psychology Reise) One of these (Thurstone) is used as an examplein the sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) or 13 personality inventory scores (epibfi) and14 multiple choice iq items (iqitems) The vegetables example has paired comparisonpreferences for 9 vegetables This is an example of Thurstonian scaling used by Guilford(1954) and Nunnally (1967) Other data sets include cubits peas and heights fromGalton

                                  Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

                                  bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

                                  satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

                                  epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

                                  50

                                  iq 14 multiple choice ability items were included as part of the Synthetic Aperture Person-ality Assessment (SAPA) web based personality assessment project The data from1000 subjects are included here as a demonstration set for scoring multiple choiceinventories and doing basic item statistics

                                  galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to ntroduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

                                  Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

                                  miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

                                  9 Development version and a users guide

                                  The most recent development version is available as a source file at the repository main-tained at httppersonality-projectorgr That version will have removed the mostrecently discovered bugs (but perhaps introduced other yet to be discovered ones) Todownload that version go to the repository httppersonality-projectorgrsrc

                                  contrib and wander around For a Mac this version can be installed directly using theldquoother repositoryrdquo option in the package installer For a PC the zip file for the most recentrelease has been created using the win-builder facility at CRAN The development releasefor the Mac is usually several weeks ahead of the PC development version

                                  Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httppersonality-projectorgrpsych_manualpdf

                                  News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

                                  gt news(Version gt 170package=psych)

                                  51

                                  10 Psychometric Theory

                                  The psych package has been developed to help psychologists do basic research Many ofthe functions were developed to supplement a book (httppersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep)More information about the use of some of the functions may be found in the book

                                  For more extensive discussion of the use of psych in particular and R in general consulthttppersonality-projectorgrrguidehtml A short guide to R

                                  11 SessionInfo

                                  This document was prepared using the following settings

                                  gt sessionInfo()

                                  R Under development (unstable) (2017-03-05 r72309)

                                  Platform x86_64-apple-darwin1340 (64-bit)

                                  Running under macOS Sierra 10124

                                  Matrix products default

                                  BLAS LibraryFrameworksRframeworkVersions34ResourcesliblibRblas0dylib

                                  LAPACK LibraryFrameworksRframeworkVersions34ResourcesliblibRlapackdylib

                                  locale

                                  [1] C

                                  attached base packages

                                  [1] stats graphics grDevices utils datasets methods base

                                  other attached packages

                                  [1] psych_17421

                                  loaded via a namespace (and not attached)

                                  [1] compiler_340 parallel_340 tools_340 foreign_08-67

                                  [5] KernSmooth_223-15 nlme_31-131 mnormt_15-4 grid_340

                                  [9] lattice_020-34

                                  52

                                  References

                                  Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

                                  Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

                                  Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

                                  Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

                                  Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

                                  Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

                                  Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

                                  Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

                                  Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

                                  Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

                                  Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

                                  Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

                                  Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

                                  Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

                                  Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

                                  53

                                  Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

                                  Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

                                  Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

                                  Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

                                  Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

                                  Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

                                  Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

                                  Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

                                  Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

                                  Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

                                  MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

                                  Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

                                  McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

                                  Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

                                  Nunnally J C (1967) Psychometric theory McGraw-Hill New York

                                  54

                                  Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

                                  3rd edition

                                  Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

                                  Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

                                  Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

                                  Revelle W (2015) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 158

                                  Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

                                  Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

                                  Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

                                  Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

                                  Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook ofIndividual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

                                  Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

                                  Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

                                  Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

                                  Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bang

                                  55

                                  for the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

                                  Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

                                  Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

                                  Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

                                  Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

                                  Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

                                  Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

                                  Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

                                  Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

                                  Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

                                  Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

                                  Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

                                  56

                                  Index

                                  affect 14 24alpha 5 6

                                  Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26bifactor 6biserial 13 34blockrandom 49burt 34

                                  char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49circular statistics 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49ctv 7cubits 50

                                  densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49diagram 8drawcor 34drawtetra 34dummycode 13

                                  dynamite plot 19

                                  edit 3epibfi 50error bars 19errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23

                                  fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factor analysis 6factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49

                                  galton 51generalized least squares 6geometricmean 49GPArotation 7guttman 6

                                  harmonicmean 49head 49headtail 49heights 50hetdiagram 7Hmisc 28Holzinger 50

                                  57

                                  ICC 6 49iclust 6iclustdiagram 7Index 49introduction to psychometric theory with ap-

                                  plications in R 7iqitems 50irtfa 6 47irt2latex 47 49

                                  KnitR 47

                                  lavaan 38library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27lowess 14

                                  makekeys 14MAP 6mardia 49maximum likelihood 6mediate 4 41 42mediatediagram 41minimum residual 6mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevel 37multilevelreliability 6multiple regression 38

                                  nfactors 6nlme 37

                                  omega 6 7outlier 3 11 12

                                  padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7principal axis 6psych 3 5ndash8 28 47 49ndash52

                                  R functionaffect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50

                                  58

                                  densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13edit 3epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49head 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47

                                  irt2latex 47 49library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7psych 51psych package

                                  affect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26

                                  59

                                  biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49corsmooth 34cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7

                                  fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47irt2latex 47 49lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12prep 49pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34

                                  60

                                  polyserial 34principal 5ndash7psych 51rtest 28rangeCorrection 49readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

                                  rtest 28

                                  rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50table 19tail 49tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

                                  R package

                                  61

                                  ctv 7GPArotation 7Hmisc 28KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash8 28 47 49ndash52Rgraphviz 8sem 7 50stats 28Sweave 47xtable 47

                                  rtest 28rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49Rgraphviz 8

                                  SAPA 26 50 51satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43sem 7 50set correlation 44setcor 49setCor 4 38 41 44 46 47simmultilevel 37

                                  spider 13stars 14stats 28StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50Sweave 47

                                  table 19tail 49tetrachoric 6 34Thurstone 28 38 50Thurstone33 50topBottom 49

                                  vegetables 50 51violinBy 14 18vss 5 6

                                  weighted least squares 6withinBetween 37

                                  xtable 47

                                  62

                                  • Jump starting the psych packagendasha guide for the impatient
                                  • Psychometric functions are summarized in the second vignette
                                  • Overview of this and related documents
                                  • Getting started
                                  • Basic data analysis
                                    • Getting the data by using readfile
                                    • Data input from the clipboard
                                    • Basic descriptive statistics
                                      • Outlier detection using outlier
                                      • Basic data cleaning using scrub
                                      • Recoding categorical variables into dummy coded variables
                                        • Simple descriptive graphics
                                          • Scatter Plot Matrices
                                          • Density or violin plots
                                          • Means and error bars
                                          • Error bars for tabular data
                                          • Two dimensional displays of means and errors
                                          • Back to back histograms
                                          • Correlational structure
                                          • Heatmap displays of correlational structure
                                            • Testing correlations
                                            • Polychoric tetrachoric polyserial and biserial correlations
                                              • Multilevel modeling
                                                • Decomposing data into within and between level correlations using statsBy
                                                • Generating and displaying multilevel data
                                                • Factor analysis by groups
                                                  • Multiple Regression mediation moderation and set correlations
                                                    • Multiple regression from data or correlation matrices
                                                    • Mediation and Moderation analysis
                                                    • Set Correlation
                                                      • Converting output to APA style tables using LaTeX
                                                      • Miscellaneous functions
                                                      • Data sets
                                                      • Development version and a users guide
                                                      • Psychometric Theory
                                                      • SessionInfo

                                    gt data(satact)

                                    gt violinBy(satact[56]satact$gendergrpname=c(M F)main=Density Plot by gender for SAT V and Q)

                                    Density Plot by gender for SAT V and Q

                                    Obs

                                    erve

                                    d

                                    SATV M SATV F SATQ M SATQ F

                                    200

                                    300

                                    400

                                    500

                                    600

                                    700

                                    800

                                    Figure 5 Using the violinBy function to show the distribution of SAT V and Q for malesand females The plot shows the medians and 25th and 75th percentiles as well as theentire range and the density distribution

                                    18

                                    343 Means and error bars

                                    Additional descriptive graphics include the ability to draw error bars on sets of data aswell as to draw error bars in both the x and y directions for paired data These are thefunctions errorbars errorbarsby errorbarstab and errorcrosses

                                    errorbars show the 95 confidence intervals for each variable in a data frame or ma-trix These errors are based upon normal theory and the standard errors of the meanAlternative options include +- one standard deviation or 1 standard error If thedata are repeated measures the error bars will be reflect the between variable cor-relations By default the confidence intervals are displayed using a ldquocats eyesrdquo plotwhich emphasizes the distribution of confidence within the confidence interval

                                    errorbarsby does the same but grouping the data by some condition

                                    errorbarstab draws bar graphs from tabular data with error bars based upon thestandard error of proportion (σp =

                                    radicpqN)

                                    errorcrosses draw the confidence intervals for an x set and a y set of the same size

                                    The use of the errorbarsby function allows for graphic comparisons of different groups(see Figure 6) Five personality measures are shown as a function of high versus low scoreson a ldquolierdquo scale People with higher lie scores tend to report being more agreeable consci-entious and less neurotic than people with lower lie scores The error bars are based uponnormal theory and thus are symmetric rather than reflect any skewing in the data

                                    Although not recommended it is possible to use the errorbars function to draw bargraphs with associated error bars (This kind of dynamite plot (Figure 8) can be verymisleading in that the scale is arbitrary Go to a discussion of the problems in presentingdata this way at httpemdbolkerwikidotcomblogdynamite In the example shownnote that the graph starts at 0 although is out of the range This is a function of usingbars which always are assumed to start at zero Consider other ways of showing yourdata

                                    344 Error bars for tabular data

                                    However it is sometimes useful to show error bars for tabular data either found by thetable function or just directly input These may be found using the errorbarstab

                                    function

                                    19

                                    gt data(epibfi)

                                    gt errorbarsby(epibfi[610]epibfi$epilielt4)

                                    095 confidence limits

                                    Independent Variable

                                    Dep

                                    ende

                                    nt V

                                    aria

                                    ble

                                    bfagree bfcon bfext bfneur bfopen

                                    050

                                    100

                                    150

                                    Figure 6 Using the errorbarsby function shows that self reported personality scales onthe Big Five Inventory vary as a function of the Lie scale on the EPI The ldquocats eyesrdquo showthe distribution of the confidence

                                    20

                                    gt errorbarsby(satact[56]satact$genderbars=TRUE

                                    + labels=c(MaleFemale)ylab=SAT scorexlab=)

                                    Male Female

                                    095 confidence limits

                                    SAT

                                    sco

                                    re

                                    200

                                    300

                                    400

                                    500

                                    600

                                    700

                                    800

                                    200

                                    300

                                    400

                                    500

                                    600

                                    700

                                    800

                                    Figure 7 A ldquoDynamite plotrdquo of SAT scores as a function of gender is one way of misleadingthe reader By using a bar graph the range of scores is ignored Bar graphs start from 0

                                    21

                                    gt T lt- with(satacttable(gendereducation))

                                    gt rownames(T) lt- c(MF)

                                    gt errorbarstab(Tway=bothylab=Proportion of Education Levelxlab=Level of Education

                                    + main=Proportion of sample by education level)

                                    Proportion of sample by education level

                                    Level of Education

                                    Pro

                                    port

                                    ion

                                    of E

                                    duca

                                    tion

                                    Leve

                                    l

                                    000

                                    005

                                    010

                                    015

                                    020

                                    025

                                    030

                                    M 0 M 1 M 2 M 3 M 4 M 5

                                    000

                                    005

                                    010

                                    015

                                    020

                                    025

                                    030

                                    Figure 8 The proportion of each education level that is Male or Female By using theway=rdquobothrdquo option the percentages and errors are based upon the grand total Alterna-tively way=rdquocolumnsrdquo finds column wise percentages way=rdquorowsrdquo finds rowwise percent-ages The data can be converted to percentages (as shown) or by total count (raw=TRUE)The function invisibly returns the probabilities and standard errors See the help menu foran example of entering the data as a dataframe

                                    22

                                    345 Two dimensional displays of means and errors

                                    Yet another way to display data for different conditions is to use the errorCrosses func-tion For instance the effect of various movies on both ldquoEnergetic Arousalrdquo and ldquoTenseArousalrdquo can be seen in one graph and compared to the same movie manipulations onldquoPositive Affectrdquo and ldquoNegative Affectrdquo Note how Energetic Arousal is increased by threeof the movie manipulations but that Positive Affect increases following the Happy movieonly

                                    23

                                    gt op lt- par(mfrow=c(12))

                                    gt data(affect)

                                    gt colors lt- c(blackredwhiteblue)

                                    gt films lt- c(SadHorrorNeutralHappy)

                                    gt affectstats lt- errorCircles(EA2TA2data=affect[-c(120)]group=Filmlabels=films

                                    + xlab=Energetic Arousal ylab=Tense Arousalylim=c(1022)xlim=c(820)pch=16

                                    + cex=2colors=colors main = Movies effect on arousal)gt errorCircles(PA2NA2data=affectstatslabels=filmsxlab=Positive Affect

                                    + ylab=Negative Affect pch=16cex=2colors=colors main =Movies effect on affect)

                                    gt op lt- par(mfrow=c(11))

                                    8 12 16 20

                                    1012

                                    1416

                                    1820

                                    22

                                    Movies effect on arousal

                                    Energetic Arousal

                                    Tens

                                    e A

                                    rous

                                    al

                                    SadHorror

                                    NeutralHappy

                                    6 8 10 12

                                    24

                                    68

                                    10

                                    Movies effect on affect

                                    Positive Affect

                                    Neg

                                    ativ

                                    e A

                                    ffect

                                    Sad

                                    Horror

                                    NeutralHappy

                                    Figure 9 The use of the errorCircles function allows for two dimensional displays ofmeans and error bars The first call to errorCircles finds descriptive statistics for theaffect dataframe based upon the grouping variable of Film These data are returned andthen used by the second call which examines the effect of the same grouping variable upondifferent measures The size of the circles represent the relative sample sizes for each groupThe data are from the PMC lab and reported in Smillie et al (2012)

                                    24

                                    346 Back to back histograms

                                    The bibars function summarize the characteristics of two groups (eg males and females)on a second variable (eg age) by drawing back to back histograms (see Figure 10)

                                    25

                                    data(bfi)gt png( bibarspng )

                                    gt with(bfibibars(agegenderylab=Agemain=Age by males and females))

                                    gt devoff()

                                    null device

                                    1

                                    Figure 10 A bar plot of the age distribution for males and females shows the use ofbibars The data are males and females from 2800 cases collected using the SAPAprocedure and are available as part of the bfi data set

                                    26

                                    347 Correlational structure

                                    There are many ways to display correlations Tabular displays are probably the mostcommon The output from the cor function in core R is a rectangular matrix lowerMat

                                    will round this to (2) digits and then display as a lower off diagonal matrix lowerCor

                                    calls cor with use=lsquopairwisersquo method=lsquopearsonrsquo as default values and returns (invisibly)the full correlation matrix and displays the lower off diagonal matrix

                                    gt lowerCor(satact)

                                    gendr edctn age ACT SATV SATQ

                                    gender 100

                                    education 009 100

                                    age -002 055 100

                                    ACT -004 015 011 100

                                    SATV -002 005 -004 056 100

                                    SATQ -017 003 -003 059 064 100

                                    When comparing results from two different groups it is convenient to display them as onematrix with the results from one group below the diagonal and the other group above thediagonal Use lowerUpper to do this

                                    gt female lt- subset(satactsatact$gender==2)

                                    gt male lt- subset(satactsatact$gender==1)

                                    gt lower lt- lowerCor(male[-1])

                                    edctn age ACT SATV SATQ

                                    education 100

                                    age 061 100

                                    ACT 016 015 100

                                    SATV 002 -006 061 100

                                    SATQ 008 004 060 068 100

                                    gt upper lt- lowerCor(female[-1])

                                    edctn age ACT SATV SATQ

                                    education 100

                                    age 052 100

                                    ACT 016 008 100

                                    SATV 007 -003 053 100

                                    SATQ 003 -009 058 063 100

                                    gt both lt- lowerUpper(lowerupper)

                                    gt round(both2)

                                    education age ACT SATV SATQ

                                    education NA 052 016 007 003

                                    age 061 NA 008 -003 -009

                                    ACT 016 015 NA 053 058

                                    SATV 002 -006 061 NA 063

                                    SATQ 008 004 060 068 NA

                                    It is also possible to compare two matrices by taking their differences and displaying one (be-low the diagonal) and the difference of the second from the first above the diagonal

                                    27

                                    gt diffs lt- lowerUpper(lowerupperdiff=TRUE)

                                    gt round(diffs2)

                                    education age ACT SATV SATQ

                                    education NA 009 000 -005 005

                                    age 061 NA 007 -003 013

                                    ACT 016 015 NA 008 002

                                    SATV 002 -006 061 NA 005

                                    SATQ 008 004 060 068 NA

                                    348 Heatmap displays of correlational structure

                                    Perhaps a better way to see the structure in a correlation matrix is to display a heat mapof the correlations This is just a matrix color coded to represent the magnitude of thecorrelation This is useful when considering the number of factors in a data set Considerthe Thurstone data set which has a clear 3 factor solution (Figure 11) or a simulated dataset of 24 variables with a circumplex structure (Figure 12) The color coding representsa ldquoheat maprdquo of the correlations with darker shades of red representing stronger negativeand darker shades of blue stronger positive correlations As an option the value of thecorrelation can be shown

                                    Yet another way to show structure is to use ldquospiderrdquo plots Particularly if variables areordered in some meaningful way (eg in a circumplex) a spider plot will show this structureeasily This is just a plot of the magnitude of the correlation as a radial line with lengthranging from 0 (for a correlation of -1) to 1 (for a correlation of 1) (See Figure 13)

                                    35 Testing correlations

                                    Correlations are wonderful descriptive statistics of the data but some people like to testwhether these correlations differ from zero or differ from each other The cortest func-tion (in the stats package) will test the significance of a single correlation and the rcorr

                                    function in the Hmisc package will do this for many correlations In the psych packagethe corrtest function reports the correlation (Pearson Spearman or Kendall) betweenall variables in either one or two data frames or matrices as well as the number of obser-vations for each case and the (two-tailed) probability for each correlation Unfortunatelythese probability values have not been corrected for multiple comparisons and so shouldbe taken with a great deal of salt Thus in corrtest and corrp the raw probabilitiesare reported below the diagonal and the probabilities adjusted for multiple comparisonsusing (by default) the Holm correction are reported above the diagonal (Table 1) (See thepadjust function for a discussion of Holm (1979) and other corrections)

                                    Testing the difference between any two correlations can be done using the rtest functionThe function actually does four different tests (based upon an article by Steiger (1980)

                                    28

                                    gt png(corplotpng)gt corPlot(Thurstonenumbers=TRUEupper=FALSEdiag=FALSEmain=9 cognitive variables from Thurstone)

                                    gt devoff()

                                    null device

                                    1

                                    Figure 11 The structure of correlation matrix can be seen more clearly if the variables aregrouped by factor and then the correlations are shown by color By using the rsquonumbersrsquooption the values are displayed as well By default the complete matrix is shown Settingupper=FALSE and diag=FALSE shows a cleaner figure

                                    29

                                    gt png(circplotpng)gt circ lt- simcirc(24)

                                    gt rcirc lt- cor(circ)

                                    gt corPlot(rcircmain=24 variables in a circumplex)gt devoff()

                                    null device

                                    1

                                    Figure 12 Using the corPlot function to show the correlations in a circumplex Correlationsare highest near the diagonal diminish to zero further from the diagonal and the increaseagain towards the corners of the matrix Circumplex structures are common in the studyof affect For circumplex structures it is perhaps useful to show the complete matrix

                                    30

                                    gt png(spiderpng)gt oplt- par(mfrow=c(22))

                                    gt spider(y=c(161218)x=124data=rcircfill=TRUEmain=Spider plot of 24 circumplex variables)

                                    gt op lt- par(mfrow=c(11))

                                    gt devoff()

                                    null device

                                    1

                                    Figure 13 A spider plot can show circumplex structure very clearly Circumplex structuresare common in the study of affect

                                    31

                                    Table 1 The corrtest function reports correlations cell sizes and raw and adjustedprobability values corrp reports the probability values for a correlation matrix Bydefault the adjustment used is that of Holm (1979)gt corrtest(satact)

                                    Callcorrtest(x = satact)

                                    Correlation matrix

                                    gender education age ACT SATV SATQ

                                    gender 100 009 -002 -004 -002 -017

                                    education 009 100 055 015 005 003

                                    age -002 055 100 011 -004 -003

                                    ACT -004 015 011 100 056 059

                                    SATV -002 005 -004 056 100 064

                                    SATQ -017 003 -003 059 064 100

                                    Sample Size

                                    gender education age ACT SATV SATQ

                                    gender 700 700 700 700 700 687

                                    education 700 700 700 700 700 687

                                    age 700 700 700 700 700 687

                                    ACT 700 700 700 700 700 687

                                    SATV 700 700 700 700 700 687

                                    SATQ 687 687 687 687 687 687

                                    Probability values (Entries above the diagonal are adjusted for multiple tests)

                                    gender education age ACT SATV SATQ

                                    gender 000 017 100 100 1 0

                                    education 002 000 000 000 1 1

                                    age 058 000 000 003 1 1

                                    ACT 033 000 000 000 0 0

                                    SATV 062 022 026 000 0 0

                                    SATQ 000 036 037 000 0 0

                                    To see confidence intervals of the correlations print with the short=FALSE option

                                    32

                                    depending upon the input

                                    1) For a sample size n find the t and p value for a single correlation as well as the confidenceinterval

                                    gt rtest(503)

                                    Correlation tests

                                    Callrtest(n = 50 r12 = 03)

                                    Test of significance of a correlation

                                    t value 218 with probability lt 0034

                                    and confidence interval 002 053

                                    2) For sample sizes of n and n2 (n2 = n if not specified) find the z of the difference betweenthe z transformed correlations divided by the standard error of the difference of two zscores

                                    gt rtest(3046)

                                    Correlation tests

                                    Callrtest(n = 30 r12 = 04 r34 = 06)

                                    Test of difference between two independent correlations

                                    z value 099 with probability 032

                                    3) For sample size n and correlations ra= r12 rb= r23 and r13 specified test for thedifference of two dependent correlations (Steiger case A)

                                    gt rtest(103451)

                                    Correlation tests

                                    Call[1] rtest(n = 103 r12 = 04 r23 = 01 r13 = 05 )

                                    Test of difference between two correlated correlations

                                    t value -089 with probability lt 037

                                    4) For sample size n test for the difference between two dependent correlations involvingdifferent variables (Steiger case B)

                                    gt rtest(103567558) steiger Case B

                                    Correlation tests

                                    Callrtest(n = 103 r12 = 05 r34 = 06 r23 = 07 r13 = 05 r14 = 05

                                    r24 = 08)

                                    Test of difference between two dependent correlations

                                    z value -12 with probability 023

                                    To test whether a matrix of correlations differs from what would be expected if the popu-lation correlations were all zero the function cortest follows Steiger (1980) who pointedout that the sum of the squared elements of a correlation matrix or the Fisher z scoreequivalents is distributed as chi square under the null hypothesis that the values are zero(ie elements of the identity matrix) This is particularly useful for examining whethercorrelations in a single matrix differ from zero or for comparing two matrices Althoughobvious cortest can be used to test whether the satact data matrix produces non-zerocorrelations (it does) This is a much more appropriate test when testing whether a residualmatrix differs from zero

                                    gt cortest(satact)

                                    33

                                    Tests of correlation matrices

                                    Callcortest(R1 = satact)

                                    Chi Square value 132542 with df = 15 with probability lt 18e-273

                                    36 Polychoric tetrachoric polyserial and biserial correlations

                                    The Pearson correlation of dichotomous data is also known as the φ coefficient If thedata eg ability items are thought to represent an underlying continuous although latentvariable the φ will underestimate the value of the Pearson applied to these latent variablesOne solution to this problem is to use the tetrachoric correlation which is based uponthe assumption of a bivariate normal distribution that has been cut at certain points Thedrawtetra function demonstrates the process (Figure 14) This is also shown in termsof dichotomizing the bivariate normal density function using the drawcor function (Fig-ure 15) A simple generalization of this to the case of the multiple cuts is the polychoric

                                    correlation

                                    Other estimated correlations based upon the assumption of bivariate normality with cutpoints include the biserial and polyserial correlation

                                    If the data are a mix of continuous polytomous and dichotomous variables the mixedcor

                                    function will calculate the appropriate mixture of Pearson polychoric tetrachoric biserialand polyserial correlations

                                    The correlation matrix resulting from a number of tetrachoric or polychoric correlationmatrix sometimes will not be positive semi-definite This will sometimes happen if thecorrelation matrix is formed by using pair-wise deletion of cases The corsmooth functionwill adjust the smallest eigen values of the correlation matrix to make them positive rescaleall of them to sum to the number of variables and produce aldquosmoothedrdquocorrelation matrixAn example of this problem is a data set of burt which probably had a typo in the originalcorrelation matrix Smoothing the matrix corrects this problem

                                    4 Multilevel modeling

                                    Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular use

                                    34

                                    gt drawtetra()

                                    minus3 minus2 minus1 0 1 2 3

                                    minus3

                                    minus2

                                    minus1

                                    01

                                    23

                                    Y rho = 05phi = 033

                                    X gt τY gt Τ

                                    X lt τY gt Τ

                                    X gt τY lt Τ

                                    X lt τY lt Τ

                                    x

                                    dnor

                                    m(x

                                    )

                                    X gt τ

                                    τ

                                    x1

                                    Y gt Τ

                                    Τ

                                    Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values

                                    35

                                    gt drawcor(expand=20cuts=c(00))

                                    xy

                                    z

                                    Bivariate density rho = 05

                                    Figure 15 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values It isfound (laboriously) by optimizing the fit of the bivariate normal for various values of thecorrelation to the observed cell frequencies

                                    36

                                    is the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups

                                    41 Decomposing data into within and between level correlations usingstatsBy

                                    There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler functionto give some of the basic descriptive statistics for two level models

                                    This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package

                                    rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)

                                    where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means

                                    42 Generating and displaying multilevel data

                                    withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a betweengroup correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data

                                    simmultilevel will generate simulated data with a multilevel structure

                                    The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary

                                    function specifying the variable of interest

                                    37

                                    Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )

                                    43 Factor analysis by groups

                                    Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output

                                    sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)

                                    faBy(sbnfactors=5) find the 5 factor solution for each education level

                                    5 Multiple Regression mediation moderation and set cor-relations

                                    The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices

                                    51 Multiple regression from data or correlation matrices

                                    The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstonecorrelation matrix and find the multiple correlation of the last five variables as a functionof the first 4

                                    gt setCor(y = 59x=14data=Thurstone)

                                    Call setCor(y = 59 x = 14 data = Thurstone)

                                    Multiple Regression from matrix input

                                    Beta weights

                                    FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

                                    Sentences 009 007 025 021 020

                                    Vocabulary 009 017 009 016 -002

                                    SentCompletion 002 005 004 021 008

                                    FirstLetters 058 045 021 008 031

                                    38

                                    Multiple R

                                    FourLetterWords Suffixes LetterSeries Pedigrees

                                    069 063 050 058

                                    LetterGroup

                                    048

                                    multiple R2

                                    FourLetterWords Suffixes LetterSeries Pedigrees

                                    048 040 025 034

                                    LetterGroup

                                    023

                                    Multiple Inflation Factor (VIF) = 1(1-SMC) =

                                    Sentences Vocabulary SentCompletion FirstLetters

                                    369 388 300 135

                                    Unweighted multiple R

                                    FourLetterWords Suffixes LetterSeries Pedigrees

                                    059 058 049 058

                                    LetterGroup

                                    045

                                    Unweighted multiple R2

                                    FourLetterWords Suffixes LetterSeries Pedigrees

                                    034 034 024 033

                                    LetterGroup

                                    020

                                    Various estimates of between set correlations

                                    Squared Canonical Correlations

                                    [1] 06280 01478 00076 00049

                                    Average squared canonical correlation = 02

                                    Cohens Set Correlation R2 = 069

                                    Unweighted correlation between the two sets = 073

                                    By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed

                                    gt sc lt- setCor(y = 59x=34data=Thurstonez=12)

                                    Call setCor(y = 59 x = 34 data = Thurstone z = 12)

                                    Multiple Regression from matrix input

                                    Beta weights

                                    FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

                                    SentCompletion 002 005 004 021 008

                                    FirstLetters 058 045 021 008 031

                                    Multiple R

                                    FourLetterWords Suffixes LetterSeries Pedigrees

                                    058 046 021 018

                                    LetterGroup

                                    030

                                    39

                                    multiple R2

                                    FourLetterWords Suffixes LetterSeries Pedigrees

                                    0331 0210 0043 0032

                                    LetterGroup

                                    0092

                                    Multiple Inflation Factor (VIF) = 1(1-SMC) =

                                    SentCompletion FirstLetters

                                    102 102

                                    Unweighted multiple R

                                    FourLetterWords Suffixes LetterSeries Pedigrees

                                    044 035 017 014

                                    LetterGroup

                                    026

                                    Unweighted multiple R2

                                    FourLetterWords Suffixes LetterSeries Pedigrees

                                    019 012 003 002

                                    LetterGroup

                                    007

                                    Various estimates of between set correlations

                                    Squared Canonical Correlations

                                    [1] 0405 0023

                                    Average squared canonical correlation = 021

                                    Cohens Set Correlation R2 = 042

                                    Unweighted correlation between the two sets = 048

                                    gt round(sc$residual2)

                                    FourLetterWords Suffixes LetterSeries Pedigrees

                                    FourLetterWords 052 011 009 006

                                    Suffixes 011 060 -001 001

                                    LetterSeries 009 -001 075 028

                                    Pedigrees 006 001 028 066

                                    LetterGroup 013 003 037 020

                                    LetterGroup

                                    FourLetterWords 013

                                    Suffixes 003

                                    LetterSeries 037

                                    Pedigrees 020

                                    LetterGroup 077

                                    52 Mediation and Moderation analysis

                                    Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping

                                    40

                                    Consider the example from Preacher and Hayes (2004) as analyzed using the mediate

                                    function and the subsequent graphic from mediatediagram The data are found in theexample for mediate

                                    Call mediate(y = SATIS x = THERAPY m = ATTRIB data = sobel)

                                    The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB

                                    Total Direct effect(c) of THERAPY on SATIS = 076 SE = 031 t direct = 25 with probability = 0019

                                    Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t direct = 135 with probability = 019

                                    Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033

                                    Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 004 Upper CI = 069

                                    R2 of model = 031

                                    To see the longer output specify short = FALSE in the print statement

                                    Full output

                                    Total effect estimates (c)

                                    SATIS se t Prob

                                    THERAPY 076 031 25 00186

                                    Direct effect estimates (c)SATIS se t Prob

                                    THERAPY 043 032 135 0190

                                    ATTRIB 040 018 223 0034

                                    a effect estimates

                                    THERAPY se t Prob

                                    ATTRIB 082 03 274 00106

                                    b effect estimates

                                    SATIS se t Prob

                                    ATTRIB 04 018 223 0034

                                    ab effect estimates

                                    SATIS boot sd lower upper

                                    THERAPY 033 032 017 004 069

                                    bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables

                                    setCor(y = c( SATV SATQ) x = c(education age ) data = satact std=TRUE)

                                    bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap

                                    mediate(y = c( SATV ) x = c(education age ) m= ACT data =satactstd=TRUEniter=50)

                                    bull mediate will take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It then tests the mediation effect using a bootstrap The particular example is for demonstration purposes only and shows neithermoderation nor mediation The number of iterations for the boot strap was set to 50

                                    41

                                    gt mediatediagram(preacher)

                                    Mediation model

                                    THERAPY SATIS

                                    ATTRIB

                                    082

                                    c = 076

                                    c = 043

                                    04

                                    Figure 16 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram

                                    42

                                    gt preacher lt- setCor(1c(23)sobelstd=FALSE)

                                    gt setCordiagram(preacher)

                                    Regression Models

                                    THERAPY

                                    ATTRIB

                                    SATIS

                                    043

                                    04

                                    021

                                    Figure 17 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure

                                    43

                                    for speed The default number of boot straps is 5000

                                    53 Set Correlation

                                    An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor

                                    function Set correlation is

                                    R2 = 1minusn

                                    prodi=1

                                    (1minusλi)

                                    where λi is the ith eigen value of the eigen value decomposition of the matrix

                                    R = Rminus1xx RxyRminus1

                                    xx Rminus1xy

                                    Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate

                                    setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data

                                    Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things tonotice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied

                                    For this example the analysis is done on the correlation matrix rather than the rawdata

                                    gt C lt- cov(satactuse=pairwise)

                                    gt model1 lt- lm(ACT~ gender + education + age data=satact)

                                    gt summary(model1)

                                    Call

                                    lm(formula = ACT ~ gender + education + age data = satact)

                                    Residuals

                                    44

                                    Call mediate(y = c(SATQ) x = c(ACT) m = education data = satact

                                    mod = gender niter = 50 std = TRUE)

                                    The DV (Y) was SATQ The IV (X) was ACT gender ACTXgndr The mediating variable(s) = education

                                    Total Direct effect(c) of ACT on SATQ = 058 SE = 003 t direct = 1925 with probability = 0

                                    Direct effect (c) of ACT on SATQ removing education = 059 SE = 003 t direct = 1926 with probability = 0

                                    Indirect effect (ab) of ACT on SATQ through education = -001

                                    Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -002 Upper CI = 0

                                    Total Direct effect(c) of gender on SATQ = -014 SE = 003 t direct = -478 with probability = 21e-06

                                    Direct effect (c) of gender on NA removing education = -014 SE = 003 t direct = -463 with probability = 44e-06

                                    Indirect effect (ab) of gender on SATQ through education = 0

                                    Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -001 Upper CI = 0

                                    Total Direct effect(c) of ACTXgndr on SATQ = 0 SE = 003 t direct = 002 with probability = 099

                                    Direct effect (c) of ACTXgndr on NA removing education = 0 SE = 003 t direct = 001 with probability = 099

                                    Indirect effect (ab) of ACTXgndr on SATQ through education = 0

                                    Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = 0 Upper CI = 0

                                    R2 of model = 037

                                    To see the longer output specify short = FALSE in the print statement

                                    Full output

                                    Total effect estimates (c)

                                    SATQ se t Prob

                                    ACT 058 003 1925 000e+00

                                    gender -014 003 -478 210e-06

                                    ACTXgndr 000 003 002 985e-01

                                    Direct effect estimates (c)SATQ se t Prob

                                    ACT 059 003 1926 000e+00

                                    gender -014 003 -463 437e-06

                                    ACTXgndr 000 003 001 992e-01

                                    a effect estimates

                                    education se t Prob

                                    ACT 016 004 422 277e-05

                                    gender 009 004 250 128e-02

                                    ACTXgndr -001 004 -015 883e-01

                                    b effect estimates

                                    SATQ se t Prob

                                    education -004 003 -145 0147

                                    ab effect estimates

                                    SATQ boot sd lower upper

                                    ACT -001 -001 001 0 0

                                    gender 000 000 000 0 0

                                    ACTXgndr 000 000 000 0 0

                                    Moderation model

                                    ACT

                                    gender

                                    ACTXgndr

                                    SATQ

                                    education016 c = 058

                                    c = 059

                                    009 c = minus014

                                    c = minus014

                                    minus001 c = 0

                                    c = 0

                                    minus004

                                    minus004

                                    minus007

                                    002

                                    Figure 18 Moderated multiple regression requires the raw data

                                    45

                                    Min 1Q Median 3Q Max

                                    -252458 -32133 07769 35921 92630

                                    Coefficients

                                    Estimate Std Error t value Pr(gt|t|)

                                    (Intercept) 2741706 082140 33378 lt 2e-16

                                    gender -048606 037984 -1280 020110

                                    education 047890 015235 3143 000174

                                    age 001623 002278 0712 047650

                                    ---

                                    Signif codes 0 0001 001 005 01 1

                                    Residual standard error 4768 on 696 degrees of freedom

                                    Multiple R-squared 00272 Adjusted R-squared 002301

                                    F-statistic 6487 on 3 and 696 DF p-value 00002476

                                    Compare this with the output from setCor

                                    gt compare with sector

                                    gt setCor(c(46)c(13)C nobs=700)

                                    Call setCor(y = c(46) x = c(13) data = C nobs = 700)

                                    Multiple Regression from matrix input

                                    Beta weights

                                    ACT SATV SATQ

                                    gender -005 -003 -018

                                    education 014 010 010

                                    age 003 -010 -009

                                    Multiple R

                                    ACT SATV SATQ

                                    016 010 019

                                    multiple R2

                                    ACT SATV SATQ

                                    00272 00096 00359

                                    Multiple Inflation Factor (VIF) = 1(1-SMC) =

                                    gender education age

                                    101 145 144

                                    Unweighted multiple R

                                    ACT SATV SATQ

                                    015 005 011

                                    Unweighted multiple R2

                                    ACT SATV SATQ

                                    002 000 001

                                    SE of Beta weights

                                    ACT SATV SATQ

                                    gender 018 429 434

                                    education 022 513 518

                                    age 022 511 516

                                    t of Beta Weights

                                    ACT SATV SATQ

                                    gender -027 -001 -004

                                    education 065 002 002

                                    46

                                    age 015 -002 -002

                                    Probability of t lt

                                    ACT SATV SATQ

                                    gender 079 099 097

                                    education 051 098 098

                                    age 088 098 099

                                    Shrunken R2

                                    ACT SATV SATQ

                                    00230 00054 00317

                                    Standard Error of R2

                                    ACT SATV SATQ

                                    00120 00073 00137

                                    F

                                    ACT SATV SATQ

                                    649 226 863

                                    Probability of F lt

                                    ACT SATV SATQ

                                    248e-04 808e-02 124e-05

                                    degrees of freedom of regression

                                    [1] 3 696

                                    Various estimates of between set correlations

                                    Squared Canonical Correlations

                                    [1] 0050 0033 0008

                                    Chisq of canonical correlations

                                    [1] 358 231 56

                                    Average squared canonical correlation = 003

                                    Cohens Set Correlation R2 = 009

                                    Shrunken Set Correlation R2 = 008

                                    F and df of Cohens Set Correlation 726 9 168186

                                    Unweighted correlation between the two sets = 001

                                    Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

                                    6 Converting output to APA style tables using LATEX

                                    Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenient

                                    47

                                    LATEXoutput and finally df2latex converts a generic data frame to LATEX

                                    An example of converting the output from fa to LATEXappears in Table 2

                                    Table 2 fa2latexA factor analysis table from the psych package in R

                                    Variable MR1 MR2 MR3 h2 u2 com

                                    Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

                                    SS loadings 264 186 15

                                    MR1 100 059 054MR2 059 100 052MR3 054 052 100

                                    48

                                    7 Miscellaneous functions

                                    A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

                                    blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

                                    df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

                                    scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

                                    cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

                                    cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

                                    dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

                                    fisherz Convert a correlation to the corresponding Fisher z score

                                    geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

                                    ICC and cohenkappa are typically used to find the reliability for raters

                                    headtail combines the head and tail functions to show the first and last lines of a dataset or output

                                    topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

                                    mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

                                    prep finds the probability of replication for an F t or r and estimate effect size

                                    partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

                                    rangeCorrection will correct correlations for restriction of range

                                    reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

                                    49

                                    superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

                                    8 Data sets

                                    A number of data sets for demonstrating psychometric techniques are included in thepsych package These include six data sets showing a hierarchical factor structure (fivecognitive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2and one from health psychology Reise) One of these (Thurstone) is used as an examplein the sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) or 13 personality inventory scores (epibfi) and14 multiple choice iq items (iqitems) The vegetables example has paired comparisonpreferences for 9 vegetables This is an example of Thurstonian scaling used by Guilford(1954) and Nunnally (1967) Other data sets include cubits peas and heights fromGalton

                                    Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

                                    bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

                                    satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

                                    epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

                                    50

                                    iq 14 multiple choice ability items were included as part of the Synthetic Aperture Person-ality Assessment (SAPA) web based personality assessment project The data from1000 subjects are included here as a demonstration set for scoring multiple choiceinventories and doing basic item statistics

                                    galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to ntroduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

                                    Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

                                    miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

                                    9 Development version and a users guide

                                    The most recent development version is available as a source file at the repository main-tained at httppersonality-projectorgr That version will have removed the mostrecently discovered bugs (but perhaps introduced other yet to be discovered ones) Todownload that version go to the repository httppersonality-projectorgrsrc

                                    contrib and wander around For a Mac this version can be installed directly using theldquoother repositoryrdquo option in the package installer For a PC the zip file for the most recentrelease has been created using the win-builder facility at CRAN The development releasefor the Mac is usually several weeks ahead of the PC development version

                                    Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httppersonality-projectorgrpsych_manualpdf

                                    News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

                                    gt news(Version gt 170package=psych)

                                    51

                                    10 Psychometric Theory

                                    The psych package has been developed to help psychologists do basic research Many ofthe functions were developed to supplement a book (httppersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep)More information about the use of some of the functions may be found in the book

                                    For more extensive discussion of the use of psych in particular and R in general consulthttppersonality-projectorgrrguidehtml A short guide to R

                                    11 SessionInfo

                                    This document was prepared using the following settings

                                    gt sessionInfo()

                                    R Under development (unstable) (2017-03-05 r72309)

                                    Platform x86_64-apple-darwin1340 (64-bit)

                                    Running under macOS Sierra 10124

                                    Matrix products default

                                    BLAS LibraryFrameworksRframeworkVersions34ResourcesliblibRblas0dylib

                                    LAPACK LibraryFrameworksRframeworkVersions34ResourcesliblibRlapackdylib

                                    locale

                                    [1] C

                                    attached base packages

                                    [1] stats graphics grDevices utils datasets methods base

                                    other attached packages

                                    [1] psych_17421

                                    loaded via a namespace (and not attached)

                                    [1] compiler_340 parallel_340 tools_340 foreign_08-67

                                    [5] KernSmooth_223-15 nlme_31-131 mnormt_15-4 grid_340

                                    [9] lattice_020-34

                                    52

                                    References

                                    Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

                                    Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

                                    Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

                                    Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

                                    Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

                                    Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

                                    Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

                                    Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

                                    Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

                                    Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

                                    Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

                                    Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

                                    Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

                                    Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

                                    Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

                                    53

                                    Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

                                    Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

                                    Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

                                    Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

                                    Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

                                    Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

                                    Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

                                    Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

                                    Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

                                    Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

                                    MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

                                    Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

                                    McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

                                    Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

                                    Nunnally J C (1967) Psychometric theory McGraw-Hill New York

                                    54

                                    Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

                                    3rd edition

                                    Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

                                    Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

                                    Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

                                    Revelle W (2015) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 158

                                    Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

                                    Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

                                    Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

                                    Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

                                    Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook ofIndividual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

                                    Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

                                    Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

                                    Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

                                    Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bang

                                    55

                                    for the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

                                    Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

                                    Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

                                    Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

                                    Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

                                    Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

                                    Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

                                    Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

                                    Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

                                    Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

                                    Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

                                    Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

                                    56

                                    Index

                                    affect 14 24alpha 5 6

                                    Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26bifactor 6biserial 13 34blockrandom 49burt 34

                                    char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49circular statistics 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49ctv 7cubits 50

                                    densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49diagram 8drawcor 34drawtetra 34dummycode 13

                                    dynamite plot 19

                                    edit 3epibfi 50error bars 19errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23

                                    fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factor analysis 6factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49

                                    galton 51generalized least squares 6geometricmean 49GPArotation 7guttman 6

                                    harmonicmean 49head 49headtail 49heights 50hetdiagram 7Hmisc 28Holzinger 50

                                    57

                                    ICC 6 49iclust 6iclustdiagram 7Index 49introduction to psychometric theory with ap-

                                    plications in R 7iqitems 50irtfa 6 47irt2latex 47 49

                                    KnitR 47

                                    lavaan 38library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27lowess 14

                                    makekeys 14MAP 6mardia 49maximum likelihood 6mediate 4 41 42mediatediagram 41minimum residual 6mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevel 37multilevelreliability 6multiple regression 38

                                    nfactors 6nlme 37

                                    omega 6 7outlier 3 11 12

                                    padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7principal axis 6psych 3 5ndash8 28 47 49ndash52

                                    R functionaffect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50

                                    58

                                    densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13edit 3epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49head 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47

                                    irt2latex 47 49library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7psych 51psych package

                                    affect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26

                                    59

                                    biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49corsmooth 34cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7

                                    fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47irt2latex 47 49lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12prep 49pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34

                                    60

                                    polyserial 34principal 5ndash7psych 51rtest 28rangeCorrection 49readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

                                    rtest 28

                                    rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50table 19tail 49tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

                                    R package

                                    61

                                    ctv 7GPArotation 7Hmisc 28KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash8 28 47 49ndash52Rgraphviz 8sem 7 50stats 28Sweave 47xtable 47

                                    rtest 28rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49Rgraphviz 8

                                    SAPA 26 50 51satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43sem 7 50set correlation 44setcor 49setCor 4 38 41 44 46 47simmultilevel 37

                                    spider 13stars 14stats 28StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50Sweave 47

                                    table 19tail 49tetrachoric 6 34Thurstone 28 38 50Thurstone33 50topBottom 49

                                    vegetables 50 51violinBy 14 18vss 5 6

                                    weighted least squares 6withinBetween 37

                                    xtable 47

                                    62

                                    • Jump starting the psych packagendasha guide for the impatient
                                    • Psychometric functions are summarized in the second vignette
                                    • Overview of this and related documents
                                    • Getting started
                                    • Basic data analysis
                                      • Getting the data by using readfile
                                      • Data input from the clipboard
                                      • Basic descriptive statistics
                                        • Outlier detection using outlier
                                        • Basic data cleaning using scrub
                                        • Recoding categorical variables into dummy coded variables
                                          • Simple descriptive graphics
                                            • Scatter Plot Matrices
                                            • Density or violin plots
                                            • Means and error bars
                                            • Error bars for tabular data
                                            • Two dimensional displays of means and errors
                                            • Back to back histograms
                                            • Correlational structure
                                            • Heatmap displays of correlational structure
                                              • Testing correlations
                                              • Polychoric tetrachoric polyserial and biserial correlations
                                                • Multilevel modeling
                                                  • Decomposing data into within and between level correlations using statsBy
                                                  • Generating and displaying multilevel data
                                                  • Factor analysis by groups
                                                    • Multiple Regression mediation moderation and set correlations
                                                      • Multiple regression from data or correlation matrices
                                                      • Mediation and Moderation analysis
                                                      • Set Correlation
                                                        • Converting output to APA style tables using LaTeX
                                                        • Miscellaneous functions
                                                        • Data sets
                                                        • Development version and a users guide
                                                        • Psychometric Theory
                                                        • SessionInfo

                                      343 Means and error bars

                                      Additional descriptive graphics include the ability to draw error bars on sets of data aswell as to draw error bars in both the x and y directions for paired data These are thefunctions errorbars errorbarsby errorbarstab and errorcrosses

                                      errorbars show the 95 confidence intervals for each variable in a data frame or ma-trix These errors are based upon normal theory and the standard errors of the meanAlternative options include +- one standard deviation or 1 standard error If thedata are repeated measures the error bars will be reflect the between variable cor-relations By default the confidence intervals are displayed using a ldquocats eyesrdquo plotwhich emphasizes the distribution of confidence within the confidence interval

                                      errorbarsby does the same but grouping the data by some condition

                                      errorbarstab draws bar graphs from tabular data with error bars based upon thestandard error of proportion (σp =

                                      radicpqN)

                                      errorcrosses draw the confidence intervals for an x set and a y set of the same size

                                      The use of the errorbarsby function allows for graphic comparisons of different groups(see Figure 6) Five personality measures are shown as a function of high versus low scoreson a ldquolierdquo scale People with higher lie scores tend to report being more agreeable consci-entious and less neurotic than people with lower lie scores The error bars are based uponnormal theory and thus are symmetric rather than reflect any skewing in the data

                                      Although not recommended it is possible to use the errorbars function to draw bargraphs with associated error bars (This kind of dynamite plot (Figure 8) can be verymisleading in that the scale is arbitrary Go to a discussion of the problems in presentingdata this way at httpemdbolkerwikidotcomblogdynamite In the example shownnote that the graph starts at 0 although is out of the range This is a function of usingbars which always are assumed to start at zero Consider other ways of showing yourdata

                                      344 Error bars for tabular data

                                      However it is sometimes useful to show error bars for tabular data either found by thetable function or just directly input These may be found using the errorbarstab

                                      function

                                      19

                                      gt data(epibfi)

                                      gt errorbarsby(epibfi[610]epibfi$epilielt4)

                                      095 confidence limits

                                      Independent Variable

                                      Dep

                                      ende

                                      nt V

                                      aria

                                      ble

                                      bfagree bfcon bfext bfneur bfopen

                                      050

                                      100

                                      150

                                      Figure 6 Using the errorbarsby function shows that self reported personality scales onthe Big Five Inventory vary as a function of the Lie scale on the EPI The ldquocats eyesrdquo showthe distribution of the confidence

                                      20

                                      gt errorbarsby(satact[56]satact$genderbars=TRUE

                                      + labels=c(MaleFemale)ylab=SAT scorexlab=)

                                      Male Female

                                      095 confidence limits

                                      SAT

                                      sco

                                      re

                                      200

                                      300

                                      400

                                      500

                                      600

                                      700

                                      800

                                      200

                                      300

                                      400

                                      500

                                      600

                                      700

                                      800

                                      Figure 7 A ldquoDynamite plotrdquo of SAT scores as a function of gender is one way of misleadingthe reader By using a bar graph the range of scores is ignored Bar graphs start from 0

                                      21

                                      gt T lt- with(satacttable(gendereducation))

                                      gt rownames(T) lt- c(MF)

                                      gt errorbarstab(Tway=bothylab=Proportion of Education Levelxlab=Level of Education

                                      + main=Proportion of sample by education level)

                                      Proportion of sample by education level

                                      Level of Education

                                      Pro

                                      port

                                      ion

                                      of E

                                      duca

                                      tion

                                      Leve

                                      l

                                      000

                                      005

                                      010

                                      015

                                      020

                                      025

                                      030

                                      M 0 M 1 M 2 M 3 M 4 M 5

                                      000

                                      005

                                      010

                                      015

                                      020

                                      025

                                      030

                                      Figure 8 The proportion of each education level that is Male or Female By using theway=rdquobothrdquo option the percentages and errors are based upon the grand total Alterna-tively way=rdquocolumnsrdquo finds column wise percentages way=rdquorowsrdquo finds rowwise percent-ages The data can be converted to percentages (as shown) or by total count (raw=TRUE)The function invisibly returns the probabilities and standard errors See the help menu foran example of entering the data as a dataframe

                                      22

                                      345 Two dimensional displays of means and errors

                                      Yet another way to display data for different conditions is to use the errorCrosses func-tion For instance the effect of various movies on both ldquoEnergetic Arousalrdquo and ldquoTenseArousalrdquo can be seen in one graph and compared to the same movie manipulations onldquoPositive Affectrdquo and ldquoNegative Affectrdquo Note how Energetic Arousal is increased by threeof the movie manipulations but that Positive Affect increases following the Happy movieonly

                                      23

                                      gt op lt- par(mfrow=c(12))

                                      gt data(affect)

                                      gt colors lt- c(blackredwhiteblue)

                                      gt films lt- c(SadHorrorNeutralHappy)

                                      gt affectstats lt- errorCircles(EA2TA2data=affect[-c(120)]group=Filmlabels=films

                                      + xlab=Energetic Arousal ylab=Tense Arousalylim=c(1022)xlim=c(820)pch=16

                                      + cex=2colors=colors main = Movies effect on arousal)gt errorCircles(PA2NA2data=affectstatslabels=filmsxlab=Positive Affect

                                      + ylab=Negative Affect pch=16cex=2colors=colors main =Movies effect on affect)

                                      gt op lt- par(mfrow=c(11))

                                      8 12 16 20

                                      1012

                                      1416

                                      1820

                                      22

                                      Movies effect on arousal

                                      Energetic Arousal

                                      Tens

                                      e A

                                      rous

                                      al

                                      SadHorror

                                      NeutralHappy

                                      6 8 10 12

                                      24

                                      68

                                      10

                                      Movies effect on affect

                                      Positive Affect

                                      Neg

                                      ativ

                                      e A

                                      ffect

                                      Sad

                                      Horror

                                      NeutralHappy

                                      Figure 9 The use of the errorCircles function allows for two dimensional displays ofmeans and error bars The first call to errorCircles finds descriptive statistics for theaffect dataframe based upon the grouping variable of Film These data are returned andthen used by the second call which examines the effect of the same grouping variable upondifferent measures The size of the circles represent the relative sample sizes for each groupThe data are from the PMC lab and reported in Smillie et al (2012)

                                      24

                                      346 Back to back histograms

                                      The bibars function summarize the characteristics of two groups (eg males and females)on a second variable (eg age) by drawing back to back histograms (see Figure 10)

                                      25

                                      data(bfi)gt png( bibarspng )

                                      gt with(bfibibars(agegenderylab=Agemain=Age by males and females))

                                      gt devoff()

                                      null device

                                      1

                                      Figure 10 A bar plot of the age distribution for males and females shows the use ofbibars The data are males and females from 2800 cases collected using the SAPAprocedure and are available as part of the bfi data set

                                      26

                                      347 Correlational structure

                                      There are many ways to display correlations Tabular displays are probably the mostcommon The output from the cor function in core R is a rectangular matrix lowerMat

                                      will round this to (2) digits and then display as a lower off diagonal matrix lowerCor

                                      calls cor with use=lsquopairwisersquo method=lsquopearsonrsquo as default values and returns (invisibly)the full correlation matrix and displays the lower off diagonal matrix

                                      gt lowerCor(satact)

                                      gendr edctn age ACT SATV SATQ

                                      gender 100

                                      education 009 100

                                      age -002 055 100

                                      ACT -004 015 011 100

                                      SATV -002 005 -004 056 100

                                      SATQ -017 003 -003 059 064 100

                                      When comparing results from two different groups it is convenient to display them as onematrix with the results from one group below the diagonal and the other group above thediagonal Use lowerUpper to do this

                                      gt female lt- subset(satactsatact$gender==2)

                                      gt male lt- subset(satactsatact$gender==1)

                                      gt lower lt- lowerCor(male[-1])

                                      edctn age ACT SATV SATQ

                                      education 100

                                      age 061 100

                                      ACT 016 015 100

                                      SATV 002 -006 061 100

                                      SATQ 008 004 060 068 100

                                      gt upper lt- lowerCor(female[-1])

                                      edctn age ACT SATV SATQ

                                      education 100

                                      age 052 100

                                      ACT 016 008 100

                                      SATV 007 -003 053 100

                                      SATQ 003 -009 058 063 100

                                      gt both lt- lowerUpper(lowerupper)

                                      gt round(both2)

                                      education age ACT SATV SATQ

                                      education NA 052 016 007 003

                                      age 061 NA 008 -003 -009

                                      ACT 016 015 NA 053 058

                                      SATV 002 -006 061 NA 063

                                      SATQ 008 004 060 068 NA

                                      It is also possible to compare two matrices by taking their differences and displaying one (be-low the diagonal) and the difference of the second from the first above the diagonal

                                      27

                                      gt diffs lt- lowerUpper(lowerupperdiff=TRUE)

                                      gt round(diffs2)

                                      education age ACT SATV SATQ

                                      education NA 009 000 -005 005

                                      age 061 NA 007 -003 013

                                      ACT 016 015 NA 008 002

                                      SATV 002 -006 061 NA 005

                                      SATQ 008 004 060 068 NA

                                      348 Heatmap displays of correlational structure

                                      Perhaps a better way to see the structure in a correlation matrix is to display a heat mapof the correlations This is just a matrix color coded to represent the magnitude of thecorrelation This is useful when considering the number of factors in a data set Considerthe Thurstone data set which has a clear 3 factor solution (Figure 11) or a simulated dataset of 24 variables with a circumplex structure (Figure 12) The color coding representsa ldquoheat maprdquo of the correlations with darker shades of red representing stronger negativeand darker shades of blue stronger positive correlations As an option the value of thecorrelation can be shown

                                      Yet another way to show structure is to use ldquospiderrdquo plots Particularly if variables areordered in some meaningful way (eg in a circumplex) a spider plot will show this structureeasily This is just a plot of the magnitude of the correlation as a radial line with lengthranging from 0 (for a correlation of -1) to 1 (for a correlation of 1) (See Figure 13)

                                      35 Testing correlations

                                      Correlations are wonderful descriptive statistics of the data but some people like to testwhether these correlations differ from zero or differ from each other The cortest func-tion (in the stats package) will test the significance of a single correlation and the rcorr

                                      function in the Hmisc package will do this for many correlations In the psych packagethe corrtest function reports the correlation (Pearson Spearman or Kendall) betweenall variables in either one or two data frames or matrices as well as the number of obser-vations for each case and the (two-tailed) probability for each correlation Unfortunatelythese probability values have not been corrected for multiple comparisons and so shouldbe taken with a great deal of salt Thus in corrtest and corrp the raw probabilitiesare reported below the diagonal and the probabilities adjusted for multiple comparisonsusing (by default) the Holm correction are reported above the diagonal (Table 1) (See thepadjust function for a discussion of Holm (1979) and other corrections)

                                      Testing the difference between any two correlations can be done using the rtest functionThe function actually does four different tests (based upon an article by Steiger (1980)

                                      28

                                      gt png(corplotpng)gt corPlot(Thurstonenumbers=TRUEupper=FALSEdiag=FALSEmain=9 cognitive variables from Thurstone)

                                      gt devoff()

                                      null device

                                      1

                                      Figure 11 The structure of correlation matrix can be seen more clearly if the variables aregrouped by factor and then the correlations are shown by color By using the rsquonumbersrsquooption the values are displayed as well By default the complete matrix is shown Settingupper=FALSE and diag=FALSE shows a cleaner figure

                                      29

                                      gt png(circplotpng)gt circ lt- simcirc(24)

                                      gt rcirc lt- cor(circ)

                                      gt corPlot(rcircmain=24 variables in a circumplex)gt devoff()

                                      null device

                                      1

                                      Figure 12 Using the corPlot function to show the correlations in a circumplex Correlationsare highest near the diagonal diminish to zero further from the diagonal and the increaseagain towards the corners of the matrix Circumplex structures are common in the studyof affect For circumplex structures it is perhaps useful to show the complete matrix

                                      30

                                      gt png(spiderpng)gt oplt- par(mfrow=c(22))

                                      gt spider(y=c(161218)x=124data=rcircfill=TRUEmain=Spider plot of 24 circumplex variables)

                                      gt op lt- par(mfrow=c(11))

                                      gt devoff()

                                      null device

                                      1

                                      Figure 13 A spider plot can show circumplex structure very clearly Circumplex structuresare common in the study of affect

                                      31

                                      Table 1 The corrtest function reports correlations cell sizes and raw and adjustedprobability values corrp reports the probability values for a correlation matrix Bydefault the adjustment used is that of Holm (1979)gt corrtest(satact)

                                      Callcorrtest(x = satact)

                                      Correlation matrix

                                      gender education age ACT SATV SATQ

                                      gender 100 009 -002 -004 -002 -017

                                      education 009 100 055 015 005 003

                                      age -002 055 100 011 -004 -003

                                      ACT -004 015 011 100 056 059

                                      SATV -002 005 -004 056 100 064

                                      SATQ -017 003 -003 059 064 100

                                      Sample Size

                                      gender education age ACT SATV SATQ

                                      gender 700 700 700 700 700 687

                                      education 700 700 700 700 700 687

                                      age 700 700 700 700 700 687

                                      ACT 700 700 700 700 700 687

                                      SATV 700 700 700 700 700 687

                                      SATQ 687 687 687 687 687 687

                                      Probability values (Entries above the diagonal are adjusted for multiple tests)

                                      gender education age ACT SATV SATQ

                                      gender 000 017 100 100 1 0

                                      education 002 000 000 000 1 1

                                      age 058 000 000 003 1 1

                                      ACT 033 000 000 000 0 0

                                      SATV 062 022 026 000 0 0

                                      SATQ 000 036 037 000 0 0

                                      To see confidence intervals of the correlations print with the short=FALSE option

                                      32

                                      depending upon the input

                                      1) For a sample size n find the t and p value for a single correlation as well as the confidenceinterval

                                      gt rtest(503)

                                      Correlation tests

                                      Callrtest(n = 50 r12 = 03)

                                      Test of significance of a correlation

                                      t value 218 with probability lt 0034

                                      and confidence interval 002 053

                                      2) For sample sizes of n and n2 (n2 = n if not specified) find the z of the difference betweenthe z transformed correlations divided by the standard error of the difference of two zscores

                                      gt rtest(3046)

                                      Correlation tests

                                      Callrtest(n = 30 r12 = 04 r34 = 06)

                                      Test of difference between two independent correlations

                                      z value 099 with probability 032

                                      3) For sample size n and correlations ra= r12 rb= r23 and r13 specified test for thedifference of two dependent correlations (Steiger case A)

                                      gt rtest(103451)

                                      Correlation tests

                                      Call[1] rtest(n = 103 r12 = 04 r23 = 01 r13 = 05 )

                                      Test of difference between two correlated correlations

                                      t value -089 with probability lt 037

                                      4) For sample size n test for the difference between two dependent correlations involvingdifferent variables (Steiger case B)

                                      gt rtest(103567558) steiger Case B

                                      Correlation tests

                                      Callrtest(n = 103 r12 = 05 r34 = 06 r23 = 07 r13 = 05 r14 = 05

                                      r24 = 08)

                                      Test of difference between two dependent correlations

                                      z value -12 with probability 023

                                      To test whether a matrix of correlations differs from what would be expected if the popu-lation correlations were all zero the function cortest follows Steiger (1980) who pointedout that the sum of the squared elements of a correlation matrix or the Fisher z scoreequivalents is distributed as chi square under the null hypothesis that the values are zero(ie elements of the identity matrix) This is particularly useful for examining whethercorrelations in a single matrix differ from zero or for comparing two matrices Althoughobvious cortest can be used to test whether the satact data matrix produces non-zerocorrelations (it does) This is a much more appropriate test when testing whether a residualmatrix differs from zero

                                      gt cortest(satact)

                                      33

                                      Tests of correlation matrices

                                      Callcortest(R1 = satact)

                                      Chi Square value 132542 with df = 15 with probability lt 18e-273

                                      36 Polychoric tetrachoric polyserial and biserial correlations

                                      The Pearson correlation of dichotomous data is also known as the φ coefficient If thedata eg ability items are thought to represent an underlying continuous although latentvariable the φ will underestimate the value of the Pearson applied to these latent variablesOne solution to this problem is to use the tetrachoric correlation which is based uponthe assumption of a bivariate normal distribution that has been cut at certain points Thedrawtetra function demonstrates the process (Figure 14) This is also shown in termsof dichotomizing the bivariate normal density function using the drawcor function (Fig-ure 15) A simple generalization of this to the case of the multiple cuts is the polychoric

                                      correlation

                                      Other estimated correlations based upon the assumption of bivariate normality with cutpoints include the biserial and polyserial correlation

                                      If the data are a mix of continuous polytomous and dichotomous variables the mixedcor

                                      function will calculate the appropriate mixture of Pearson polychoric tetrachoric biserialand polyserial correlations

                                      The correlation matrix resulting from a number of tetrachoric or polychoric correlationmatrix sometimes will not be positive semi-definite This will sometimes happen if thecorrelation matrix is formed by using pair-wise deletion of cases The corsmooth functionwill adjust the smallest eigen values of the correlation matrix to make them positive rescaleall of them to sum to the number of variables and produce aldquosmoothedrdquocorrelation matrixAn example of this problem is a data set of burt which probably had a typo in the originalcorrelation matrix Smoothing the matrix corrects this problem

                                      4 Multilevel modeling

                                      Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular use

                                      34

                                      gt drawtetra()

                                      minus3 minus2 minus1 0 1 2 3

                                      minus3

                                      minus2

                                      minus1

                                      01

                                      23

                                      Y rho = 05phi = 033

                                      X gt τY gt Τ

                                      X lt τY gt Τ

                                      X gt τY lt Τ

                                      X lt τY lt Τ

                                      x

                                      dnor

                                      m(x

                                      )

                                      X gt τ

                                      τ

                                      x1

                                      Y gt Τ

                                      Τ

                                      Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values

                                      35

                                      gt drawcor(expand=20cuts=c(00))

                                      xy

                                      z

                                      Bivariate density rho = 05

                                      Figure 15 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values It isfound (laboriously) by optimizing the fit of the bivariate normal for various values of thecorrelation to the observed cell frequencies

                                      36

                                      is the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups

                                      41 Decomposing data into within and between level correlations usingstatsBy

                                      There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler functionto give some of the basic descriptive statistics for two level models

                                      This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package

                                      rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)

                                      where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means

                                      42 Generating and displaying multilevel data

                                      withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a betweengroup correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data

                                      simmultilevel will generate simulated data with a multilevel structure

                                      The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary

                                      function specifying the variable of interest

                                      37

                                      Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )

                                      43 Factor analysis by groups

                                      Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output

                                      sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)

                                      faBy(sbnfactors=5) find the 5 factor solution for each education level

                                      5 Multiple Regression mediation moderation and set cor-relations

                                      The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices

                                      51 Multiple regression from data or correlation matrices

                                      The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstonecorrelation matrix and find the multiple correlation of the last five variables as a functionof the first 4

                                      gt setCor(y = 59x=14data=Thurstone)

                                      Call setCor(y = 59 x = 14 data = Thurstone)

                                      Multiple Regression from matrix input

                                      Beta weights

                                      FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

                                      Sentences 009 007 025 021 020

                                      Vocabulary 009 017 009 016 -002

                                      SentCompletion 002 005 004 021 008

                                      FirstLetters 058 045 021 008 031

                                      38

                                      Multiple R

                                      FourLetterWords Suffixes LetterSeries Pedigrees

                                      069 063 050 058

                                      LetterGroup

                                      048

                                      multiple R2

                                      FourLetterWords Suffixes LetterSeries Pedigrees

                                      048 040 025 034

                                      LetterGroup

                                      023

                                      Multiple Inflation Factor (VIF) = 1(1-SMC) =

                                      Sentences Vocabulary SentCompletion FirstLetters

                                      369 388 300 135

                                      Unweighted multiple R

                                      FourLetterWords Suffixes LetterSeries Pedigrees

                                      059 058 049 058

                                      LetterGroup

                                      045

                                      Unweighted multiple R2

                                      FourLetterWords Suffixes LetterSeries Pedigrees

                                      034 034 024 033

                                      LetterGroup

                                      020

                                      Various estimates of between set correlations

                                      Squared Canonical Correlations

                                      [1] 06280 01478 00076 00049

                                      Average squared canonical correlation = 02

                                      Cohens Set Correlation R2 = 069

                                      Unweighted correlation between the two sets = 073

                                      By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed

                                      gt sc lt- setCor(y = 59x=34data=Thurstonez=12)

                                      Call setCor(y = 59 x = 34 data = Thurstone z = 12)

                                      Multiple Regression from matrix input

                                      Beta weights

                                      FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

                                      SentCompletion 002 005 004 021 008

                                      FirstLetters 058 045 021 008 031

                                      Multiple R

                                      FourLetterWords Suffixes LetterSeries Pedigrees

                                      058 046 021 018

                                      LetterGroup

                                      030

                                      39

                                      multiple R2

                                      FourLetterWords Suffixes LetterSeries Pedigrees

                                      0331 0210 0043 0032

                                      LetterGroup

                                      0092

                                      Multiple Inflation Factor (VIF) = 1(1-SMC) =

                                      SentCompletion FirstLetters

                                      102 102

                                      Unweighted multiple R

                                      FourLetterWords Suffixes LetterSeries Pedigrees

                                      044 035 017 014

                                      LetterGroup

                                      026

                                      Unweighted multiple R2

                                      FourLetterWords Suffixes LetterSeries Pedigrees

                                      019 012 003 002

                                      LetterGroup

                                      007

                                      Various estimates of between set correlations

                                      Squared Canonical Correlations

                                      [1] 0405 0023

                                      Average squared canonical correlation = 021

                                      Cohens Set Correlation R2 = 042

                                      Unweighted correlation between the two sets = 048

                                      gt round(sc$residual2)

                                      FourLetterWords Suffixes LetterSeries Pedigrees

                                      FourLetterWords 052 011 009 006

                                      Suffixes 011 060 -001 001

                                      LetterSeries 009 -001 075 028

                                      Pedigrees 006 001 028 066

                                      LetterGroup 013 003 037 020

                                      LetterGroup

                                      FourLetterWords 013

                                      Suffixes 003

                                      LetterSeries 037

                                      Pedigrees 020

                                      LetterGroup 077

                                      52 Mediation and Moderation analysis

                                      Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping

                                      40

                                      Consider the example from Preacher and Hayes (2004) as analyzed using the mediate

                                      function and the subsequent graphic from mediatediagram The data are found in theexample for mediate

                                      Call mediate(y = SATIS x = THERAPY m = ATTRIB data = sobel)

                                      The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB

                                      Total Direct effect(c) of THERAPY on SATIS = 076 SE = 031 t direct = 25 with probability = 0019

                                      Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t direct = 135 with probability = 019

                                      Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033

                                      Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 004 Upper CI = 069

                                      R2 of model = 031

                                      To see the longer output specify short = FALSE in the print statement

                                      Full output

                                      Total effect estimates (c)

                                      SATIS se t Prob

                                      THERAPY 076 031 25 00186

                                      Direct effect estimates (c)SATIS se t Prob

                                      THERAPY 043 032 135 0190

                                      ATTRIB 040 018 223 0034

                                      a effect estimates

                                      THERAPY se t Prob

                                      ATTRIB 082 03 274 00106

                                      b effect estimates

                                      SATIS se t Prob

                                      ATTRIB 04 018 223 0034

                                      ab effect estimates

                                      SATIS boot sd lower upper

                                      THERAPY 033 032 017 004 069

                                      bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables

                                      setCor(y = c( SATV SATQ) x = c(education age ) data = satact std=TRUE)

                                      bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap

                                      mediate(y = c( SATV ) x = c(education age ) m= ACT data =satactstd=TRUEniter=50)

                                      bull mediate will take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It then tests the mediation effect using a bootstrap The particular example is for demonstration purposes only and shows neithermoderation nor mediation The number of iterations for the boot strap was set to 50

                                      41

                                      gt mediatediagram(preacher)

                                      Mediation model

                                      THERAPY SATIS

                                      ATTRIB

                                      082

                                      c = 076

                                      c = 043

                                      04

                                      Figure 16 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram

                                      42

                                      gt preacher lt- setCor(1c(23)sobelstd=FALSE)

                                      gt setCordiagram(preacher)

                                      Regression Models

                                      THERAPY

                                      ATTRIB

                                      SATIS

                                      043

                                      04

                                      021

                                      Figure 17 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure

                                      43

                                      for speed The default number of boot straps is 5000

                                      53 Set Correlation

                                      An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor

                                      function Set correlation is

                                      R2 = 1minusn

                                      prodi=1

                                      (1minusλi)

                                      where λi is the ith eigen value of the eigen value decomposition of the matrix

                                      R = Rminus1xx RxyRminus1

                                      xx Rminus1xy

                                      Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate

                                      setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data

                                      Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things tonotice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied

                                      For this example the analysis is done on the correlation matrix rather than the rawdata

                                      gt C lt- cov(satactuse=pairwise)

                                      gt model1 lt- lm(ACT~ gender + education + age data=satact)

                                      gt summary(model1)

                                      Call

                                      lm(formula = ACT ~ gender + education + age data = satact)

                                      Residuals

                                      44

                                      Call mediate(y = c(SATQ) x = c(ACT) m = education data = satact

                                      mod = gender niter = 50 std = TRUE)

                                      The DV (Y) was SATQ The IV (X) was ACT gender ACTXgndr The mediating variable(s) = education

                                      Total Direct effect(c) of ACT on SATQ = 058 SE = 003 t direct = 1925 with probability = 0

                                      Direct effect (c) of ACT on SATQ removing education = 059 SE = 003 t direct = 1926 with probability = 0

                                      Indirect effect (ab) of ACT on SATQ through education = -001

                                      Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -002 Upper CI = 0

                                      Total Direct effect(c) of gender on SATQ = -014 SE = 003 t direct = -478 with probability = 21e-06

                                      Direct effect (c) of gender on NA removing education = -014 SE = 003 t direct = -463 with probability = 44e-06

                                      Indirect effect (ab) of gender on SATQ through education = 0

                                      Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -001 Upper CI = 0

                                      Total Direct effect(c) of ACTXgndr on SATQ = 0 SE = 003 t direct = 002 with probability = 099

                                      Direct effect (c) of ACTXgndr on NA removing education = 0 SE = 003 t direct = 001 with probability = 099

                                      Indirect effect (ab) of ACTXgndr on SATQ through education = 0

                                      Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = 0 Upper CI = 0

                                      R2 of model = 037

                                      To see the longer output specify short = FALSE in the print statement

                                      Full output

                                      Total effect estimates (c)

                                      SATQ se t Prob

                                      ACT 058 003 1925 000e+00

                                      gender -014 003 -478 210e-06

                                      ACTXgndr 000 003 002 985e-01

                                      Direct effect estimates (c)SATQ se t Prob

                                      ACT 059 003 1926 000e+00

                                      gender -014 003 -463 437e-06

                                      ACTXgndr 000 003 001 992e-01

                                      a effect estimates

                                      education se t Prob

                                      ACT 016 004 422 277e-05

                                      gender 009 004 250 128e-02

                                      ACTXgndr -001 004 -015 883e-01

                                      b effect estimates

                                      SATQ se t Prob

                                      education -004 003 -145 0147

                                      ab effect estimates

                                      SATQ boot sd lower upper

                                      ACT -001 -001 001 0 0

                                      gender 000 000 000 0 0

                                      ACTXgndr 000 000 000 0 0

                                      Moderation model

                                      ACT

                                      gender

                                      ACTXgndr

                                      SATQ

                                      education016 c = 058

                                      c = 059

                                      009 c = minus014

                                      c = minus014

                                      minus001 c = 0

                                      c = 0

                                      minus004

                                      minus004

                                      minus007

                                      002

                                      Figure 18 Moderated multiple regression requires the raw data

                                      45

                                      Min 1Q Median 3Q Max

                                      -252458 -32133 07769 35921 92630

                                      Coefficients

                                      Estimate Std Error t value Pr(gt|t|)

                                      (Intercept) 2741706 082140 33378 lt 2e-16

                                      gender -048606 037984 -1280 020110

                                      education 047890 015235 3143 000174

                                      age 001623 002278 0712 047650

                                      ---

                                      Signif codes 0 0001 001 005 01 1

                                      Residual standard error 4768 on 696 degrees of freedom

                                      Multiple R-squared 00272 Adjusted R-squared 002301

                                      F-statistic 6487 on 3 and 696 DF p-value 00002476

                                      Compare this with the output from setCor

                                      gt compare with sector

                                      gt setCor(c(46)c(13)C nobs=700)

                                      Call setCor(y = c(46) x = c(13) data = C nobs = 700)

                                      Multiple Regression from matrix input

                                      Beta weights

                                      ACT SATV SATQ

                                      gender -005 -003 -018

                                      education 014 010 010

                                      age 003 -010 -009

                                      Multiple R

                                      ACT SATV SATQ

                                      016 010 019

                                      multiple R2

                                      ACT SATV SATQ

                                      00272 00096 00359

                                      Multiple Inflation Factor (VIF) = 1(1-SMC) =

                                      gender education age

                                      101 145 144

                                      Unweighted multiple R

                                      ACT SATV SATQ

                                      015 005 011

                                      Unweighted multiple R2

                                      ACT SATV SATQ

                                      002 000 001

                                      SE of Beta weights

                                      ACT SATV SATQ

                                      gender 018 429 434

                                      education 022 513 518

                                      age 022 511 516

                                      t of Beta Weights

                                      ACT SATV SATQ

                                      gender -027 -001 -004

                                      education 065 002 002

                                      46

                                      age 015 -002 -002

                                      Probability of t lt

                                      ACT SATV SATQ

                                      gender 079 099 097

                                      education 051 098 098

                                      age 088 098 099

                                      Shrunken R2

                                      ACT SATV SATQ

                                      00230 00054 00317

                                      Standard Error of R2

                                      ACT SATV SATQ

                                      00120 00073 00137

                                      F

                                      ACT SATV SATQ

                                      649 226 863

                                      Probability of F lt

                                      ACT SATV SATQ

                                      248e-04 808e-02 124e-05

                                      degrees of freedom of regression

                                      [1] 3 696

                                      Various estimates of between set correlations

                                      Squared Canonical Correlations

                                      [1] 0050 0033 0008

                                      Chisq of canonical correlations

                                      [1] 358 231 56

                                      Average squared canonical correlation = 003

                                      Cohens Set Correlation R2 = 009

                                      Shrunken Set Correlation R2 = 008

                                      F and df of Cohens Set Correlation 726 9 168186

                                      Unweighted correlation between the two sets = 001

                                      Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

                                      6 Converting output to APA style tables using LATEX

                                      Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenient

                                      47

                                      LATEXoutput and finally df2latex converts a generic data frame to LATEX

                                      An example of converting the output from fa to LATEXappears in Table 2

                                      Table 2 fa2latexA factor analysis table from the psych package in R

                                      Variable MR1 MR2 MR3 h2 u2 com

                                      Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

                                      SS loadings 264 186 15

                                      MR1 100 059 054MR2 059 100 052MR3 054 052 100

                                      48

                                      7 Miscellaneous functions

                                      A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

                                      blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

                                      df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

                                      scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

                                      cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

                                      cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

                                      dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

                                      fisherz Convert a correlation to the corresponding Fisher z score

                                      geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

                                      ICC and cohenkappa are typically used to find the reliability for raters

                                      headtail combines the head and tail functions to show the first and last lines of a dataset or output

                                      topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

                                      mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

                                      prep finds the probability of replication for an F t or r and estimate effect size

                                      partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

                                      rangeCorrection will correct correlations for restriction of range

                                      reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

                                      49

                                      superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

                                      8 Data sets

                                      A number of data sets for demonstrating psychometric techniques are included in thepsych package These include six data sets showing a hierarchical factor structure (fivecognitive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2and one from health psychology Reise) One of these (Thurstone) is used as an examplein the sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) or 13 personality inventory scores (epibfi) and14 multiple choice iq items (iqitems) The vegetables example has paired comparisonpreferences for 9 vegetables This is an example of Thurstonian scaling used by Guilford(1954) and Nunnally (1967) Other data sets include cubits peas and heights fromGalton

                                      Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

                                      bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

                                      satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

                                      epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

                                      50

                                      iq 14 multiple choice ability items were included as part of the Synthetic Aperture Person-ality Assessment (SAPA) web based personality assessment project The data from1000 subjects are included here as a demonstration set for scoring multiple choiceinventories and doing basic item statistics

                                      galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to ntroduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

                                      Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

                                      miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

                                      9 Development version and a users guide

                                      The most recent development version is available as a source file at the repository main-tained at httppersonality-projectorgr That version will have removed the mostrecently discovered bugs (but perhaps introduced other yet to be discovered ones) Todownload that version go to the repository httppersonality-projectorgrsrc

                                      contrib and wander around For a Mac this version can be installed directly using theldquoother repositoryrdquo option in the package installer For a PC the zip file for the most recentrelease has been created using the win-builder facility at CRAN The development releasefor the Mac is usually several weeks ahead of the PC development version

                                      Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httppersonality-projectorgrpsych_manualpdf

                                      News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

                                      gt news(Version gt 170package=psych)

                                      51

                                      10 Psychometric Theory

                                      The psych package has been developed to help psychologists do basic research Many ofthe functions were developed to supplement a book (httppersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep)More information about the use of some of the functions may be found in the book

                                      For more extensive discussion of the use of psych in particular and R in general consulthttppersonality-projectorgrrguidehtml A short guide to R

                                      11 SessionInfo

                                      This document was prepared using the following settings

                                      gt sessionInfo()

                                      R Under development (unstable) (2017-03-05 r72309)

                                      Platform x86_64-apple-darwin1340 (64-bit)

                                      Running under macOS Sierra 10124

                                      Matrix products default

                                      BLAS LibraryFrameworksRframeworkVersions34ResourcesliblibRblas0dylib

                                      LAPACK LibraryFrameworksRframeworkVersions34ResourcesliblibRlapackdylib

                                      locale

                                      [1] C

                                      attached base packages

                                      [1] stats graphics grDevices utils datasets methods base

                                      other attached packages

                                      [1] psych_17421

                                      loaded via a namespace (and not attached)

                                      [1] compiler_340 parallel_340 tools_340 foreign_08-67

                                      [5] KernSmooth_223-15 nlme_31-131 mnormt_15-4 grid_340

                                      [9] lattice_020-34

                                      52

                                      References

                                      Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

                                      Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

                                      Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

                                      Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

                                      Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

                                      Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

                                      Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

                                      Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

                                      Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

                                      Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

                                      Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

                                      Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

                                      Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

                                      Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

                                      Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

                                      53

                                      Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

                                      Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

                                      Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

                                      Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

                                      Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

                                      Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

                                      Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

                                      Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

                                      Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

                                      Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

                                      MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

                                      Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

                                      McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

                                      Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

                                      Nunnally J C (1967) Psychometric theory McGraw-Hill New York

                                      54

                                      Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

                                      3rd edition

                                      Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

                                      Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

                                      Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

                                      Revelle W (2015) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 158

                                      Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

                                      Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

                                      Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

                                      Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

                                      Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook ofIndividual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

                                      Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

                                      Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

                                      Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

                                      Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bang

                                      55

                                      for the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

                                      Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

                                      Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

                                      Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

                                      Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

                                      Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

                                      Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

                                      Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

                                      Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

                                      Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

                                      Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

                                      Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

                                      56

                                      Index

                                      affect 14 24alpha 5 6

                                      Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26bifactor 6biserial 13 34blockrandom 49burt 34

                                      char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49circular statistics 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49ctv 7cubits 50

                                      densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49diagram 8drawcor 34drawtetra 34dummycode 13

                                      dynamite plot 19

                                      edit 3epibfi 50error bars 19errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23

                                      fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factor analysis 6factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49

                                      galton 51generalized least squares 6geometricmean 49GPArotation 7guttman 6

                                      harmonicmean 49head 49headtail 49heights 50hetdiagram 7Hmisc 28Holzinger 50

                                      57

                                      ICC 6 49iclust 6iclustdiagram 7Index 49introduction to psychometric theory with ap-

                                      plications in R 7iqitems 50irtfa 6 47irt2latex 47 49

                                      KnitR 47

                                      lavaan 38library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27lowess 14

                                      makekeys 14MAP 6mardia 49maximum likelihood 6mediate 4 41 42mediatediagram 41minimum residual 6mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevel 37multilevelreliability 6multiple regression 38

                                      nfactors 6nlme 37

                                      omega 6 7outlier 3 11 12

                                      padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7principal axis 6psych 3 5ndash8 28 47 49ndash52

                                      R functionaffect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50

                                      58

                                      densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13edit 3epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49head 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47

                                      irt2latex 47 49library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7psych 51psych package

                                      affect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26

                                      59

                                      biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49corsmooth 34cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7

                                      fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47irt2latex 47 49lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12prep 49pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34

                                      60

                                      polyserial 34principal 5ndash7psych 51rtest 28rangeCorrection 49readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

                                      rtest 28

                                      rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50table 19tail 49tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

                                      R package

                                      61

                                      ctv 7GPArotation 7Hmisc 28KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash8 28 47 49ndash52Rgraphviz 8sem 7 50stats 28Sweave 47xtable 47

                                      rtest 28rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49Rgraphviz 8

                                      SAPA 26 50 51satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43sem 7 50set correlation 44setcor 49setCor 4 38 41 44 46 47simmultilevel 37

                                      spider 13stars 14stats 28StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50Sweave 47

                                      table 19tail 49tetrachoric 6 34Thurstone 28 38 50Thurstone33 50topBottom 49

                                      vegetables 50 51violinBy 14 18vss 5 6

                                      weighted least squares 6withinBetween 37

                                      xtable 47

                                      62

                                      • Jump starting the psych packagendasha guide for the impatient
                                      • Psychometric functions are summarized in the second vignette
                                      • Overview of this and related documents
                                      • Getting started
                                      • Basic data analysis
                                        • Getting the data by using readfile
                                        • Data input from the clipboard
                                        • Basic descriptive statistics
                                          • Outlier detection using outlier
                                          • Basic data cleaning using scrub
                                          • Recoding categorical variables into dummy coded variables
                                            • Simple descriptive graphics
                                              • Scatter Plot Matrices
                                              • Density or violin plots
                                              • Means and error bars
                                              • Error bars for tabular data
                                              • Two dimensional displays of means and errors
                                              • Back to back histograms
                                              • Correlational structure
                                              • Heatmap displays of correlational structure
                                                • Testing correlations
                                                • Polychoric tetrachoric polyserial and biserial correlations
                                                  • Multilevel modeling
                                                    • Decomposing data into within and between level correlations using statsBy
                                                    • Generating and displaying multilevel data
                                                    • Factor analysis by groups
                                                      • Multiple Regression mediation moderation and set correlations
                                                        • Multiple regression from data or correlation matrices
                                                        • Mediation and Moderation analysis
                                                        • Set Correlation
                                                          • Converting output to APA style tables using LaTeX
                                                          • Miscellaneous functions
                                                          • Data sets
                                                          • Development version and a users guide
                                                          • Psychometric Theory
                                                          • SessionInfo

                                        gt data(epibfi)

                                        gt errorbarsby(epibfi[610]epibfi$epilielt4)

                                        095 confidence limits

                                        Independent Variable

                                        Dep

                                        ende

                                        nt V

                                        aria

                                        ble

                                        bfagree bfcon bfext bfneur bfopen

                                        050

                                        100

                                        150

                                        Figure 6 Using the errorbarsby function shows that self reported personality scales onthe Big Five Inventory vary as a function of the Lie scale on the EPI The ldquocats eyesrdquo showthe distribution of the confidence

                                        20

                                        gt errorbarsby(satact[56]satact$genderbars=TRUE

                                        + labels=c(MaleFemale)ylab=SAT scorexlab=)

                                        Male Female

                                        095 confidence limits

                                        SAT

                                        sco

                                        re

                                        200

                                        300

                                        400

                                        500

                                        600

                                        700

                                        800

                                        200

                                        300

                                        400

                                        500

                                        600

                                        700

                                        800

                                        Figure 7 A ldquoDynamite plotrdquo of SAT scores as a function of gender is one way of misleadingthe reader By using a bar graph the range of scores is ignored Bar graphs start from 0

                                        21

                                        gt T lt- with(satacttable(gendereducation))

                                        gt rownames(T) lt- c(MF)

                                        gt errorbarstab(Tway=bothylab=Proportion of Education Levelxlab=Level of Education

                                        + main=Proportion of sample by education level)

                                        Proportion of sample by education level

                                        Level of Education

                                        Pro

                                        port

                                        ion

                                        of E

                                        duca

                                        tion

                                        Leve

                                        l

                                        000

                                        005

                                        010

                                        015

                                        020

                                        025

                                        030

                                        M 0 M 1 M 2 M 3 M 4 M 5

                                        000

                                        005

                                        010

                                        015

                                        020

                                        025

                                        030

                                        Figure 8 The proportion of each education level that is Male or Female By using theway=rdquobothrdquo option the percentages and errors are based upon the grand total Alterna-tively way=rdquocolumnsrdquo finds column wise percentages way=rdquorowsrdquo finds rowwise percent-ages The data can be converted to percentages (as shown) or by total count (raw=TRUE)The function invisibly returns the probabilities and standard errors See the help menu foran example of entering the data as a dataframe

                                        22

                                        345 Two dimensional displays of means and errors

                                        Yet another way to display data for different conditions is to use the errorCrosses func-tion For instance the effect of various movies on both ldquoEnergetic Arousalrdquo and ldquoTenseArousalrdquo can be seen in one graph and compared to the same movie manipulations onldquoPositive Affectrdquo and ldquoNegative Affectrdquo Note how Energetic Arousal is increased by threeof the movie manipulations but that Positive Affect increases following the Happy movieonly

                                        23

                                        gt op lt- par(mfrow=c(12))

                                        gt data(affect)

                                        gt colors lt- c(blackredwhiteblue)

                                        gt films lt- c(SadHorrorNeutralHappy)

                                        gt affectstats lt- errorCircles(EA2TA2data=affect[-c(120)]group=Filmlabels=films

                                        + xlab=Energetic Arousal ylab=Tense Arousalylim=c(1022)xlim=c(820)pch=16

                                        + cex=2colors=colors main = Movies effect on arousal)gt errorCircles(PA2NA2data=affectstatslabels=filmsxlab=Positive Affect

                                        + ylab=Negative Affect pch=16cex=2colors=colors main =Movies effect on affect)

                                        gt op lt- par(mfrow=c(11))

                                        8 12 16 20

                                        1012

                                        1416

                                        1820

                                        22

                                        Movies effect on arousal

                                        Energetic Arousal

                                        Tens

                                        e A

                                        rous

                                        al

                                        SadHorror

                                        NeutralHappy

                                        6 8 10 12

                                        24

                                        68

                                        10

                                        Movies effect on affect

                                        Positive Affect

                                        Neg

                                        ativ

                                        e A

                                        ffect

                                        Sad

                                        Horror

                                        NeutralHappy

                                        Figure 9 The use of the errorCircles function allows for two dimensional displays ofmeans and error bars The first call to errorCircles finds descriptive statistics for theaffect dataframe based upon the grouping variable of Film These data are returned andthen used by the second call which examines the effect of the same grouping variable upondifferent measures The size of the circles represent the relative sample sizes for each groupThe data are from the PMC lab and reported in Smillie et al (2012)

                                        24

                                        346 Back to back histograms

                                        The bibars function summarize the characteristics of two groups (eg males and females)on a second variable (eg age) by drawing back to back histograms (see Figure 10)

                                        25

                                        data(bfi)gt png( bibarspng )

                                        gt with(bfibibars(agegenderylab=Agemain=Age by males and females))

                                        gt devoff()

                                        null device

                                        1

                                        Figure 10 A bar plot of the age distribution for males and females shows the use ofbibars The data are males and females from 2800 cases collected using the SAPAprocedure and are available as part of the bfi data set

                                        26

                                        347 Correlational structure

                                        There are many ways to display correlations Tabular displays are probably the mostcommon The output from the cor function in core R is a rectangular matrix lowerMat

                                        will round this to (2) digits and then display as a lower off diagonal matrix lowerCor

                                        calls cor with use=lsquopairwisersquo method=lsquopearsonrsquo as default values and returns (invisibly)the full correlation matrix and displays the lower off diagonal matrix

                                        gt lowerCor(satact)

                                        gendr edctn age ACT SATV SATQ

                                        gender 100

                                        education 009 100

                                        age -002 055 100

                                        ACT -004 015 011 100

                                        SATV -002 005 -004 056 100

                                        SATQ -017 003 -003 059 064 100

                                        When comparing results from two different groups it is convenient to display them as onematrix with the results from one group below the diagonal and the other group above thediagonal Use lowerUpper to do this

                                        gt female lt- subset(satactsatact$gender==2)

                                        gt male lt- subset(satactsatact$gender==1)

                                        gt lower lt- lowerCor(male[-1])

                                        edctn age ACT SATV SATQ

                                        education 100

                                        age 061 100

                                        ACT 016 015 100

                                        SATV 002 -006 061 100

                                        SATQ 008 004 060 068 100

                                        gt upper lt- lowerCor(female[-1])

                                        edctn age ACT SATV SATQ

                                        education 100

                                        age 052 100

                                        ACT 016 008 100

                                        SATV 007 -003 053 100

                                        SATQ 003 -009 058 063 100

                                        gt both lt- lowerUpper(lowerupper)

                                        gt round(both2)

                                        education age ACT SATV SATQ

                                        education NA 052 016 007 003

                                        age 061 NA 008 -003 -009

                                        ACT 016 015 NA 053 058

                                        SATV 002 -006 061 NA 063

                                        SATQ 008 004 060 068 NA

                                        It is also possible to compare two matrices by taking their differences and displaying one (be-low the diagonal) and the difference of the second from the first above the diagonal

                                        27

                                        gt diffs lt- lowerUpper(lowerupperdiff=TRUE)

                                        gt round(diffs2)

                                        education age ACT SATV SATQ

                                        education NA 009 000 -005 005

                                        age 061 NA 007 -003 013

                                        ACT 016 015 NA 008 002

                                        SATV 002 -006 061 NA 005

                                        SATQ 008 004 060 068 NA

                                        348 Heatmap displays of correlational structure

                                        Perhaps a better way to see the structure in a correlation matrix is to display a heat mapof the correlations This is just a matrix color coded to represent the magnitude of thecorrelation This is useful when considering the number of factors in a data set Considerthe Thurstone data set which has a clear 3 factor solution (Figure 11) or a simulated dataset of 24 variables with a circumplex structure (Figure 12) The color coding representsa ldquoheat maprdquo of the correlations with darker shades of red representing stronger negativeand darker shades of blue stronger positive correlations As an option the value of thecorrelation can be shown

                                        Yet another way to show structure is to use ldquospiderrdquo plots Particularly if variables areordered in some meaningful way (eg in a circumplex) a spider plot will show this structureeasily This is just a plot of the magnitude of the correlation as a radial line with lengthranging from 0 (for a correlation of -1) to 1 (for a correlation of 1) (See Figure 13)

                                        35 Testing correlations

                                        Correlations are wonderful descriptive statistics of the data but some people like to testwhether these correlations differ from zero or differ from each other The cortest func-tion (in the stats package) will test the significance of a single correlation and the rcorr

                                        function in the Hmisc package will do this for many correlations In the psych packagethe corrtest function reports the correlation (Pearson Spearman or Kendall) betweenall variables in either one or two data frames or matrices as well as the number of obser-vations for each case and the (two-tailed) probability for each correlation Unfortunatelythese probability values have not been corrected for multiple comparisons and so shouldbe taken with a great deal of salt Thus in corrtest and corrp the raw probabilitiesare reported below the diagonal and the probabilities adjusted for multiple comparisonsusing (by default) the Holm correction are reported above the diagonal (Table 1) (See thepadjust function for a discussion of Holm (1979) and other corrections)

                                        Testing the difference between any two correlations can be done using the rtest functionThe function actually does four different tests (based upon an article by Steiger (1980)

                                        28

                                        gt png(corplotpng)gt corPlot(Thurstonenumbers=TRUEupper=FALSEdiag=FALSEmain=9 cognitive variables from Thurstone)

                                        gt devoff()

                                        null device

                                        1

                                        Figure 11 The structure of correlation matrix can be seen more clearly if the variables aregrouped by factor and then the correlations are shown by color By using the rsquonumbersrsquooption the values are displayed as well By default the complete matrix is shown Settingupper=FALSE and diag=FALSE shows a cleaner figure

                                        29

                                        gt png(circplotpng)gt circ lt- simcirc(24)

                                        gt rcirc lt- cor(circ)

                                        gt corPlot(rcircmain=24 variables in a circumplex)gt devoff()

                                        null device

                                        1

                                        Figure 12 Using the corPlot function to show the correlations in a circumplex Correlationsare highest near the diagonal diminish to zero further from the diagonal and the increaseagain towards the corners of the matrix Circumplex structures are common in the studyof affect For circumplex structures it is perhaps useful to show the complete matrix

                                        30

                                        gt png(spiderpng)gt oplt- par(mfrow=c(22))

                                        gt spider(y=c(161218)x=124data=rcircfill=TRUEmain=Spider plot of 24 circumplex variables)

                                        gt op lt- par(mfrow=c(11))

                                        gt devoff()

                                        null device

                                        1

                                        Figure 13 A spider plot can show circumplex structure very clearly Circumplex structuresare common in the study of affect

                                        31

                                        Table 1 The corrtest function reports correlations cell sizes and raw and adjustedprobability values corrp reports the probability values for a correlation matrix Bydefault the adjustment used is that of Holm (1979)gt corrtest(satact)

                                        Callcorrtest(x = satact)

                                        Correlation matrix

                                        gender education age ACT SATV SATQ

                                        gender 100 009 -002 -004 -002 -017

                                        education 009 100 055 015 005 003

                                        age -002 055 100 011 -004 -003

                                        ACT -004 015 011 100 056 059

                                        SATV -002 005 -004 056 100 064

                                        SATQ -017 003 -003 059 064 100

                                        Sample Size

                                        gender education age ACT SATV SATQ

                                        gender 700 700 700 700 700 687

                                        education 700 700 700 700 700 687

                                        age 700 700 700 700 700 687

                                        ACT 700 700 700 700 700 687

                                        SATV 700 700 700 700 700 687

                                        SATQ 687 687 687 687 687 687

                                        Probability values (Entries above the diagonal are adjusted for multiple tests)

                                        gender education age ACT SATV SATQ

                                        gender 000 017 100 100 1 0

                                        education 002 000 000 000 1 1

                                        age 058 000 000 003 1 1

                                        ACT 033 000 000 000 0 0

                                        SATV 062 022 026 000 0 0

                                        SATQ 000 036 037 000 0 0

                                        To see confidence intervals of the correlations print with the short=FALSE option

                                        32

                                        depending upon the input

                                        1) For a sample size n find the t and p value for a single correlation as well as the confidenceinterval

                                        gt rtest(503)

                                        Correlation tests

                                        Callrtest(n = 50 r12 = 03)

                                        Test of significance of a correlation

                                        t value 218 with probability lt 0034

                                        and confidence interval 002 053

                                        2) For sample sizes of n and n2 (n2 = n if not specified) find the z of the difference betweenthe z transformed correlations divided by the standard error of the difference of two zscores

                                        gt rtest(3046)

                                        Correlation tests

                                        Callrtest(n = 30 r12 = 04 r34 = 06)

                                        Test of difference between two independent correlations

                                        z value 099 with probability 032

                                        3) For sample size n and correlations ra= r12 rb= r23 and r13 specified test for thedifference of two dependent correlations (Steiger case A)

                                        gt rtest(103451)

                                        Correlation tests

                                        Call[1] rtest(n = 103 r12 = 04 r23 = 01 r13 = 05 )

                                        Test of difference between two correlated correlations

                                        t value -089 with probability lt 037

                                        4) For sample size n test for the difference between two dependent correlations involvingdifferent variables (Steiger case B)

                                        gt rtest(103567558) steiger Case B

                                        Correlation tests

                                        Callrtest(n = 103 r12 = 05 r34 = 06 r23 = 07 r13 = 05 r14 = 05

                                        r24 = 08)

                                        Test of difference between two dependent correlations

                                        z value -12 with probability 023

                                        To test whether a matrix of correlations differs from what would be expected if the popu-lation correlations were all zero the function cortest follows Steiger (1980) who pointedout that the sum of the squared elements of a correlation matrix or the Fisher z scoreequivalents is distributed as chi square under the null hypothesis that the values are zero(ie elements of the identity matrix) This is particularly useful for examining whethercorrelations in a single matrix differ from zero or for comparing two matrices Althoughobvious cortest can be used to test whether the satact data matrix produces non-zerocorrelations (it does) This is a much more appropriate test when testing whether a residualmatrix differs from zero

                                        gt cortest(satact)

                                        33

                                        Tests of correlation matrices

                                        Callcortest(R1 = satact)

                                        Chi Square value 132542 with df = 15 with probability lt 18e-273

                                        36 Polychoric tetrachoric polyserial and biserial correlations

                                        The Pearson correlation of dichotomous data is also known as the φ coefficient If thedata eg ability items are thought to represent an underlying continuous although latentvariable the φ will underestimate the value of the Pearson applied to these latent variablesOne solution to this problem is to use the tetrachoric correlation which is based uponthe assumption of a bivariate normal distribution that has been cut at certain points Thedrawtetra function demonstrates the process (Figure 14) This is also shown in termsof dichotomizing the bivariate normal density function using the drawcor function (Fig-ure 15) A simple generalization of this to the case of the multiple cuts is the polychoric

                                        correlation

                                        Other estimated correlations based upon the assumption of bivariate normality with cutpoints include the biserial and polyserial correlation

                                        If the data are a mix of continuous polytomous and dichotomous variables the mixedcor

                                        function will calculate the appropriate mixture of Pearson polychoric tetrachoric biserialand polyserial correlations

                                        The correlation matrix resulting from a number of tetrachoric or polychoric correlationmatrix sometimes will not be positive semi-definite This will sometimes happen if thecorrelation matrix is formed by using pair-wise deletion of cases The corsmooth functionwill adjust the smallest eigen values of the correlation matrix to make them positive rescaleall of them to sum to the number of variables and produce aldquosmoothedrdquocorrelation matrixAn example of this problem is a data set of burt which probably had a typo in the originalcorrelation matrix Smoothing the matrix corrects this problem

                                        4 Multilevel modeling

                                        Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular use

                                        34

                                        gt drawtetra()

                                        minus3 minus2 minus1 0 1 2 3

                                        minus3

                                        minus2

                                        minus1

                                        01

                                        23

                                        Y rho = 05phi = 033

                                        X gt τY gt Τ

                                        X lt τY gt Τ

                                        X gt τY lt Τ

                                        X lt τY lt Τ

                                        x

                                        dnor

                                        m(x

                                        )

                                        X gt τ

                                        τ

                                        x1

                                        Y gt Τ

                                        Τ

                                        Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values

                                        35

                                        gt drawcor(expand=20cuts=c(00))

                                        xy

                                        z

                                        Bivariate density rho = 05

                                        Figure 15 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values It isfound (laboriously) by optimizing the fit of the bivariate normal for various values of thecorrelation to the observed cell frequencies

                                        36

                                        is the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups

                                        41 Decomposing data into within and between level correlations usingstatsBy

                                        There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler functionto give some of the basic descriptive statistics for two level models

                                        This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package

                                        rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)

                                        where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means

                                        42 Generating and displaying multilevel data

                                        withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a betweengroup correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data

                                        simmultilevel will generate simulated data with a multilevel structure

                                        The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary

                                        function specifying the variable of interest

                                        37

                                        Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )

                                        43 Factor analysis by groups

                                        Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output

                                        sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)

                                        faBy(sbnfactors=5) find the 5 factor solution for each education level

                                        5 Multiple Regression mediation moderation and set cor-relations

                                        The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices

                                        51 Multiple regression from data or correlation matrices

                                        The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstonecorrelation matrix and find the multiple correlation of the last five variables as a functionof the first 4

                                        gt setCor(y = 59x=14data=Thurstone)

                                        Call setCor(y = 59 x = 14 data = Thurstone)

                                        Multiple Regression from matrix input

                                        Beta weights

                                        FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

                                        Sentences 009 007 025 021 020

                                        Vocabulary 009 017 009 016 -002

                                        SentCompletion 002 005 004 021 008

                                        FirstLetters 058 045 021 008 031

                                        38

                                        Multiple R

                                        FourLetterWords Suffixes LetterSeries Pedigrees

                                        069 063 050 058

                                        LetterGroup

                                        048

                                        multiple R2

                                        FourLetterWords Suffixes LetterSeries Pedigrees

                                        048 040 025 034

                                        LetterGroup

                                        023

                                        Multiple Inflation Factor (VIF) = 1(1-SMC) =

                                        Sentences Vocabulary SentCompletion FirstLetters

                                        369 388 300 135

                                        Unweighted multiple R

                                        FourLetterWords Suffixes LetterSeries Pedigrees

                                        059 058 049 058

                                        LetterGroup

                                        045

                                        Unweighted multiple R2

                                        FourLetterWords Suffixes LetterSeries Pedigrees

                                        034 034 024 033

                                        LetterGroup

                                        020

                                        Various estimates of between set correlations

                                        Squared Canonical Correlations

                                        [1] 06280 01478 00076 00049

                                        Average squared canonical correlation = 02

                                        Cohens Set Correlation R2 = 069

                                        Unweighted correlation between the two sets = 073

                                        By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed

                                        gt sc lt- setCor(y = 59x=34data=Thurstonez=12)

                                        Call setCor(y = 59 x = 34 data = Thurstone z = 12)

                                        Multiple Regression from matrix input

                                        Beta weights

                                        FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

                                        SentCompletion 002 005 004 021 008

                                        FirstLetters 058 045 021 008 031

                                        Multiple R

                                        FourLetterWords Suffixes LetterSeries Pedigrees

                                        058 046 021 018

                                        LetterGroup

                                        030

                                        39

                                        multiple R2

                                        FourLetterWords Suffixes LetterSeries Pedigrees

                                        0331 0210 0043 0032

                                        LetterGroup

                                        0092

                                        Multiple Inflation Factor (VIF) = 1(1-SMC) =

                                        SentCompletion FirstLetters

                                        102 102

                                        Unweighted multiple R

                                        FourLetterWords Suffixes LetterSeries Pedigrees

                                        044 035 017 014

                                        LetterGroup

                                        026

                                        Unweighted multiple R2

                                        FourLetterWords Suffixes LetterSeries Pedigrees

                                        019 012 003 002

                                        LetterGroup

                                        007

                                        Various estimates of between set correlations

                                        Squared Canonical Correlations

                                        [1] 0405 0023

                                        Average squared canonical correlation = 021

                                        Cohens Set Correlation R2 = 042

                                        Unweighted correlation between the two sets = 048

                                        gt round(sc$residual2)

                                        FourLetterWords Suffixes LetterSeries Pedigrees

                                        FourLetterWords 052 011 009 006

                                        Suffixes 011 060 -001 001

                                        LetterSeries 009 -001 075 028

                                        Pedigrees 006 001 028 066

                                        LetterGroup 013 003 037 020

                                        LetterGroup

                                        FourLetterWords 013

                                        Suffixes 003

                                        LetterSeries 037

                                        Pedigrees 020

                                        LetterGroup 077

                                        52 Mediation and Moderation analysis

                                        Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping

                                        40

                                        Consider the example from Preacher and Hayes (2004) as analyzed using the mediate

                                        function and the subsequent graphic from mediatediagram The data are found in theexample for mediate

                                        Call mediate(y = SATIS x = THERAPY m = ATTRIB data = sobel)

                                        The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB

                                        Total Direct effect(c) of THERAPY on SATIS = 076 SE = 031 t direct = 25 with probability = 0019

                                        Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t direct = 135 with probability = 019

                                        Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033

                                        Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 004 Upper CI = 069

                                        R2 of model = 031

                                        To see the longer output specify short = FALSE in the print statement

                                        Full output

                                        Total effect estimates (c)

                                        SATIS se t Prob

                                        THERAPY 076 031 25 00186

                                        Direct effect estimates (c)SATIS se t Prob

                                        THERAPY 043 032 135 0190

                                        ATTRIB 040 018 223 0034

                                        a effect estimates

                                        THERAPY se t Prob

                                        ATTRIB 082 03 274 00106

                                        b effect estimates

                                        SATIS se t Prob

                                        ATTRIB 04 018 223 0034

                                        ab effect estimates

                                        SATIS boot sd lower upper

                                        THERAPY 033 032 017 004 069

                                        bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables

                                        setCor(y = c( SATV SATQ) x = c(education age ) data = satact std=TRUE)

                                        bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap

                                        mediate(y = c( SATV ) x = c(education age ) m= ACT data =satactstd=TRUEniter=50)

                                        bull mediate will take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It then tests the mediation effect using a bootstrap The particular example is for demonstration purposes only and shows neithermoderation nor mediation The number of iterations for the boot strap was set to 50

                                        41

                                        gt mediatediagram(preacher)

                                        Mediation model

                                        THERAPY SATIS

                                        ATTRIB

                                        082

                                        c = 076

                                        c = 043

                                        04

                                        Figure 16 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram

                                        42

                                        gt preacher lt- setCor(1c(23)sobelstd=FALSE)

                                        gt setCordiagram(preacher)

                                        Regression Models

                                        THERAPY

                                        ATTRIB

                                        SATIS

                                        043

                                        04

                                        021

                                        Figure 17 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure

                                        43

                                        for speed The default number of boot straps is 5000

                                        53 Set Correlation

                                        An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor

                                        function Set correlation is

                                        R2 = 1minusn

                                        prodi=1

                                        (1minusλi)

                                        where λi is the ith eigen value of the eigen value decomposition of the matrix

                                        R = Rminus1xx RxyRminus1

                                        xx Rminus1xy

                                        Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate

                                        setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data

                                        Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things tonotice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied

                                        For this example the analysis is done on the correlation matrix rather than the rawdata

                                        gt C lt- cov(satactuse=pairwise)

                                        gt model1 lt- lm(ACT~ gender + education + age data=satact)

                                        gt summary(model1)

                                        Call

                                        lm(formula = ACT ~ gender + education + age data = satact)

                                        Residuals

                                        44

                                        Call mediate(y = c(SATQ) x = c(ACT) m = education data = satact

                                        mod = gender niter = 50 std = TRUE)

                                        The DV (Y) was SATQ The IV (X) was ACT gender ACTXgndr The mediating variable(s) = education

                                        Total Direct effect(c) of ACT on SATQ = 058 SE = 003 t direct = 1925 with probability = 0

                                        Direct effect (c) of ACT on SATQ removing education = 059 SE = 003 t direct = 1926 with probability = 0

                                        Indirect effect (ab) of ACT on SATQ through education = -001

                                        Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -002 Upper CI = 0

                                        Total Direct effect(c) of gender on SATQ = -014 SE = 003 t direct = -478 with probability = 21e-06

                                        Direct effect (c) of gender on NA removing education = -014 SE = 003 t direct = -463 with probability = 44e-06

                                        Indirect effect (ab) of gender on SATQ through education = 0

                                        Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -001 Upper CI = 0

                                        Total Direct effect(c) of ACTXgndr on SATQ = 0 SE = 003 t direct = 002 with probability = 099

                                        Direct effect (c) of ACTXgndr on NA removing education = 0 SE = 003 t direct = 001 with probability = 099

                                        Indirect effect (ab) of ACTXgndr on SATQ through education = 0

                                        Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = 0 Upper CI = 0

                                        R2 of model = 037

                                        To see the longer output specify short = FALSE in the print statement

                                        Full output

                                        Total effect estimates (c)

                                        SATQ se t Prob

                                        ACT 058 003 1925 000e+00

                                        gender -014 003 -478 210e-06

                                        ACTXgndr 000 003 002 985e-01

                                        Direct effect estimates (c)SATQ se t Prob

                                        ACT 059 003 1926 000e+00

                                        gender -014 003 -463 437e-06

                                        ACTXgndr 000 003 001 992e-01

                                        a effect estimates

                                        education se t Prob

                                        ACT 016 004 422 277e-05

                                        gender 009 004 250 128e-02

                                        ACTXgndr -001 004 -015 883e-01

                                        b effect estimates

                                        SATQ se t Prob

                                        education -004 003 -145 0147

                                        ab effect estimates

                                        SATQ boot sd lower upper

                                        ACT -001 -001 001 0 0

                                        gender 000 000 000 0 0

                                        ACTXgndr 000 000 000 0 0

                                        Moderation model

                                        ACT

                                        gender

                                        ACTXgndr

                                        SATQ

                                        education016 c = 058

                                        c = 059

                                        009 c = minus014

                                        c = minus014

                                        minus001 c = 0

                                        c = 0

                                        minus004

                                        minus004

                                        minus007

                                        002

                                        Figure 18 Moderated multiple regression requires the raw data

                                        45

                                        Min 1Q Median 3Q Max

                                        -252458 -32133 07769 35921 92630

                                        Coefficients

                                        Estimate Std Error t value Pr(gt|t|)

                                        (Intercept) 2741706 082140 33378 lt 2e-16

                                        gender -048606 037984 -1280 020110

                                        education 047890 015235 3143 000174

                                        age 001623 002278 0712 047650

                                        ---

                                        Signif codes 0 0001 001 005 01 1

                                        Residual standard error 4768 on 696 degrees of freedom

                                        Multiple R-squared 00272 Adjusted R-squared 002301

                                        F-statistic 6487 on 3 and 696 DF p-value 00002476

                                        Compare this with the output from setCor

                                        gt compare with sector

                                        gt setCor(c(46)c(13)C nobs=700)

                                        Call setCor(y = c(46) x = c(13) data = C nobs = 700)

                                        Multiple Regression from matrix input

                                        Beta weights

                                        ACT SATV SATQ

                                        gender -005 -003 -018

                                        education 014 010 010

                                        age 003 -010 -009

                                        Multiple R

                                        ACT SATV SATQ

                                        016 010 019

                                        multiple R2

                                        ACT SATV SATQ

                                        00272 00096 00359

                                        Multiple Inflation Factor (VIF) = 1(1-SMC) =

                                        gender education age

                                        101 145 144

                                        Unweighted multiple R

                                        ACT SATV SATQ

                                        015 005 011

                                        Unweighted multiple R2

                                        ACT SATV SATQ

                                        002 000 001

                                        SE of Beta weights

                                        ACT SATV SATQ

                                        gender 018 429 434

                                        education 022 513 518

                                        age 022 511 516

                                        t of Beta Weights

                                        ACT SATV SATQ

                                        gender -027 -001 -004

                                        education 065 002 002

                                        46

                                        age 015 -002 -002

                                        Probability of t lt

                                        ACT SATV SATQ

                                        gender 079 099 097

                                        education 051 098 098

                                        age 088 098 099

                                        Shrunken R2

                                        ACT SATV SATQ

                                        00230 00054 00317

                                        Standard Error of R2

                                        ACT SATV SATQ

                                        00120 00073 00137

                                        F

                                        ACT SATV SATQ

                                        649 226 863

                                        Probability of F lt

                                        ACT SATV SATQ

                                        248e-04 808e-02 124e-05

                                        degrees of freedom of regression

                                        [1] 3 696

                                        Various estimates of between set correlations

                                        Squared Canonical Correlations

                                        [1] 0050 0033 0008

                                        Chisq of canonical correlations

                                        [1] 358 231 56

                                        Average squared canonical correlation = 003

                                        Cohens Set Correlation R2 = 009

                                        Shrunken Set Correlation R2 = 008

                                        F and df of Cohens Set Correlation 726 9 168186

                                        Unweighted correlation between the two sets = 001

                                        Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

                                        6 Converting output to APA style tables using LATEX

                                        Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenient

                                        47

                                        LATEXoutput and finally df2latex converts a generic data frame to LATEX

                                        An example of converting the output from fa to LATEXappears in Table 2

                                        Table 2 fa2latexA factor analysis table from the psych package in R

                                        Variable MR1 MR2 MR3 h2 u2 com

                                        Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

                                        SS loadings 264 186 15

                                        MR1 100 059 054MR2 059 100 052MR3 054 052 100

                                        48

                                        7 Miscellaneous functions

                                        A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

                                        blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

                                        df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

                                        scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

                                        cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

                                        cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

                                        dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

                                        fisherz Convert a correlation to the corresponding Fisher z score

                                        geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

                                        ICC and cohenkappa are typically used to find the reliability for raters

                                        headtail combines the head and tail functions to show the first and last lines of a dataset or output

                                        topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

                                        mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

                                        prep finds the probability of replication for an F t or r and estimate effect size

                                        partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

                                        rangeCorrection will correct correlations for restriction of range

                                        reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

                                        49

                                        superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

                                        8 Data sets

                                        A number of data sets for demonstrating psychometric techniques are included in thepsych package These include six data sets showing a hierarchical factor structure (fivecognitive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2and one from health psychology Reise) One of these (Thurstone) is used as an examplein the sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) or 13 personality inventory scores (epibfi) and14 multiple choice iq items (iqitems) The vegetables example has paired comparisonpreferences for 9 vegetables This is an example of Thurstonian scaling used by Guilford(1954) and Nunnally (1967) Other data sets include cubits peas and heights fromGalton

                                        Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

                                        bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

                                        satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

                                        epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

                                        50

                                        iq 14 multiple choice ability items were included as part of the Synthetic Aperture Person-ality Assessment (SAPA) web based personality assessment project The data from1000 subjects are included here as a demonstration set for scoring multiple choiceinventories and doing basic item statistics

                                        galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to ntroduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

                                        Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

                                        miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

                                        9 Development version and a users guide

                                        The most recent development version is available as a source file at the repository main-tained at httppersonality-projectorgr That version will have removed the mostrecently discovered bugs (but perhaps introduced other yet to be discovered ones) Todownload that version go to the repository httppersonality-projectorgrsrc

                                        contrib and wander around For a Mac this version can be installed directly using theldquoother repositoryrdquo option in the package installer For a PC the zip file for the most recentrelease has been created using the win-builder facility at CRAN The development releasefor the Mac is usually several weeks ahead of the PC development version

                                        Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httppersonality-projectorgrpsych_manualpdf

                                        News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

                                        gt news(Version gt 170package=psych)

                                        51

                                        10 Psychometric Theory

                                        The psych package has been developed to help psychologists do basic research Many ofthe functions were developed to supplement a book (httppersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep)More information about the use of some of the functions may be found in the book

                                        For more extensive discussion of the use of psych in particular and R in general consulthttppersonality-projectorgrrguidehtml A short guide to R

                                        11 SessionInfo

                                        This document was prepared using the following settings

                                        gt sessionInfo()

                                        R Under development (unstable) (2017-03-05 r72309)

                                        Platform x86_64-apple-darwin1340 (64-bit)

                                        Running under macOS Sierra 10124

                                        Matrix products default

                                        BLAS LibraryFrameworksRframeworkVersions34ResourcesliblibRblas0dylib

                                        LAPACK LibraryFrameworksRframeworkVersions34ResourcesliblibRlapackdylib

                                        locale

                                        [1] C

                                        attached base packages

                                        [1] stats graphics grDevices utils datasets methods base

                                        other attached packages

                                        [1] psych_17421

                                        loaded via a namespace (and not attached)

                                        [1] compiler_340 parallel_340 tools_340 foreign_08-67

                                        [5] KernSmooth_223-15 nlme_31-131 mnormt_15-4 grid_340

                                        [9] lattice_020-34

                                        52

                                        References

                                        Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

                                        Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

                                        Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

                                        Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

                                        Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

                                        Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

                                        Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

                                        Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

                                        Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

                                        Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

                                        Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

                                        Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

                                        Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

                                        Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

                                        Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

                                        53

                                        Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

                                        Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

                                        Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

                                        Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

                                        Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

                                        Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

                                        Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

                                        Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

                                        Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

                                        Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

                                        MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

                                        Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

                                        McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

                                        Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

                                        Nunnally J C (1967) Psychometric theory McGraw-Hill New York

                                        54

                                        Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

                                        3rd edition

                                        Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

                                        Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

                                        Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

                                        Revelle W (2015) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 158

                                        Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

                                        Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

                                        Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

                                        Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

                                        Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook ofIndividual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

                                        Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

                                        Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

                                        Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

                                        Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bang

                                        55

                                        for the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

                                        Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

                                        Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

                                        Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

                                        Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

                                        Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

                                        Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

                                        Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

                                        Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

                                        Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

                                        Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

                                        Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

                                        56

                                        Index

                                        affect 14 24alpha 5 6

                                        Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26bifactor 6biserial 13 34blockrandom 49burt 34

                                        char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49circular statistics 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49ctv 7cubits 50

                                        densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49diagram 8drawcor 34drawtetra 34dummycode 13

                                        dynamite plot 19

                                        edit 3epibfi 50error bars 19errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23

                                        fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factor analysis 6factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49

                                        galton 51generalized least squares 6geometricmean 49GPArotation 7guttman 6

                                        harmonicmean 49head 49headtail 49heights 50hetdiagram 7Hmisc 28Holzinger 50

                                        57

                                        ICC 6 49iclust 6iclustdiagram 7Index 49introduction to psychometric theory with ap-

                                        plications in R 7iqitems 50irtfa 6 47irt2latex 47 49

                                        KnitR 47

                                        lavaan 38library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27lowess 14

                                        makekeys 14MAP 6mardia 49maximum likelihood 6mediate 4 41 42mediatediagram 41minimum residual 6mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevel 37multilevelreliability 6multiple regression 38

                                        nfactors 6nlme 37

                                        omega 6 7outlier 3 11 12

                                        padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7principal axis 6psych 3 5ndash8 28 47 49ndash52

                                        R functionaffect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50

                                        58

                                        densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13edit 3epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49head 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47

                                        irt2latex 47 49library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7psych 51psych package

                                        affect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26

                                        59

                                        biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49corsmooth 34cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7

                                        fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47irt2latex 47 49lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12prep 49pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34

                                        60

                                        polyserial 34principal 5ndash7psych 51rtest 28rangeCorrection 49readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

                                        rtest 28

                                        rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50table 19tail 49tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

                                        R package

                                        61

                                        ctv 7GPArotation 7Hmisc 28KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash8 28 47 49ndash52Rgraphviz 8sem 7 50stats 28Sweave 47xtable 47

                                        rtest 28rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49Rgraphviz 8

                                        SAPA 26 50 51satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43sem 7 50set correlation 44setcor 49setCor 4 38 41 44 46 47simmultilevel 37

                                        spider 13stars 14stats 28StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50Sweave 47

                                        table 19tail 49tetrachoric 6 34Thurstone 28 38 50Thurstone33 50topBottom 49

                                        vegetables 50 51violinBy 14 18vss 5 6

                                        weighted least squares 6withinBetween 37

                                        xtable 47

                                        62

                                        • Jump starting the psych packagendasha guide for the impatient
                                        • Psychometric functions are summarized in the second vignette
                                        • Overview of this and related documents
                                        • Getting started
                                        • Basic data analysis
                                          • Getting the data by using readfile
                                          • Data input from the clipboard
                                          • Basic descriptive statistics
                                            • Outlier detection using outlier
                                            • Basic data cleaning using scrub
                                            • Recoding categorical variables into dummy coded variables
                                              • Simple descriptive graphics
                                                • Scatter Plot Matrices
                                                • Density or violin plots
                                                • Means and error bars
                                                • Error bars for tabular data
                                                • Two dimensional displays of means and errors
                                                • Back to back histograms
                                                • Correlational structure
                                                • Heatmap displays of correlational structure
                                                  • Testing correlations
                                                  • Polychoric tetrachoric polyserial and biserial correlations
                                                    • Multilevel modeling
                                                      • Decomposing data into within and between level correlations using statsBy
                                                      • Generating and displaying multilevel data
                                                      • Factor analysis by groups
                                                        • Multiple Regression mediation moderation and set correlations
                                                          • Multiple regression from data or correlation matrices
                                                          • Mediation and Moderation analysis
                                                          • Set Correlation
                                                            • Converting output to APA style tables using LaTeX
                                                            • Miscellaneous functions
                                                            • Data sets
                                                            • Development version and a users guide
                                                            • Psychometric Theory
                                                            • SessionInfo

                                          gt errorbarsby(satact[56]satact$genderbars=TRUE

                                          + labels=c(MaleFemale)ylab=SAT scorexlab=)

                                          Male Female

                                          095 confidence limits

                                          SAT

                                          sco

                                          re

                                          200

                                          300

                                          400

                                          500

                                          600

                                          700

                                          800

                                          200

                                          300

                                          400

                                          500

                                          600

                                          700

                                          800

                                          Figure 7 A ldquoDynamite plotrdquo of SAT scores as a function of gender is one way of misleadingthe reader By using a bar graph the range of scores is ignored Bar graphs start from 0

                                          21

                                          gt T lt- with(satacttable(gendereducation))

                                          gt rownames(T) lt- c(MF)

                                          gt errorbarstab(Tway=bothylab=Proportion of Education Levelxlab=Level of Education

                                          + main=Proportion of sample by education level)

                                          Proportion of sample by education level

                                          Level of Education

                                          Pro

                                          port

                                          ion

                                          of E

                                          duca

                                          tion

                                          Leve

                                          l

                                          000

                                          005

                                          010

                                          015

                                          020

                                          025

                                          030

                                          M 0 M 1 M 2 M 3 M 4 M 5

                                          000

                                          005

                                          010

                                          015

                                          020

                                          025

                                          030

                                          Figure 8 The proportion of each education level that is Male or Female By using theway=rdquobothrdquo option the percentages and errors are based upon the grand total Alterna-tively way=rdquocolumnsrdquo finds column wise percentages way=rdquorowsrdquo finds rowwise percent-ages The data can be converted to percentages (as shown) or by total count (raw=TRUE)The function invisibly returns the probabilities and standard errors See the help menu foran example of entering the data as a dataframe

                                          22

                                          345 Two dimensional displays of means and errors

                                          Yet another way to display data for different conditions is to use the errorCrosses func-tion For instance the effect of various movies on both ldquoEnergetic Arousalrdquo and ldquoTenseArousalrdquo can be seen in one graph and compared to the same movie manipulations onldquoPositive Affectrdquo and ldquoNegative Affectrdquo Note how Energetic Arousal is increased by threeof the movie manipulations but that Positive Affect increases following the Happy movieonly

                                          23

                                          gt op lt- par(mfrow=c(12))

                                          gt data(affect)

                                          gt colors lt- c(blackredwhiteblue)

                                          gt films lt- c(SadHorrorNeutralHappy)

                                          gt affectstats lt- errorCircles(EA2TA2data=affect[-c(120)]group=Filmlabels=films

                                          + xlab=Energetic Arousal ylab=Tense Arousalylim=c(1022)xlim=c(820)pch=16

                                          + cex=2colors=colors main = Movies effect on arousal)gt errorCircles(PA2NA2data=affectstatslabels=filmsxlab=Positive Affect

                                          + ylab=Negative Affect pch=16cex=2colors=colors main =Movies effect on affect)

                                          gt op lt- par(mfrow=c(11))

                                          8 12 16 20

                                          1012

                                          1416

                                          1820

                                          22

                                          Movies effect on arousal

                                          Energetic Arousal

                                          Tens

                                          e A

                                          rous

                                          al

                                          SadHorror

                                          NeutralHappy

                                          6 8 10 12

                                          24

                                          68

                                          10

                                          Movies effect on affect

                                          Positive Affect

                                          Neg

                                          ativ

                                          e A

                                          ffect

                                          Sad

                                          Horror

                                          NeutralHappy

                                          Figure 9 The use of the errorCircles function allows for two dimensional displays ofmeans and error bars The first call to errorCircles finds descriptive statistics for theaffect dataframe based upon the grouping variable of Film These data are returned andthen used by the second call which examines the effect of the same grouping variable upondifferent measures The size of the circles represent the relative sample sizes for each groupThe data are from the PMC lab and reported in Smillie et al (2012)

                                          24

                                          346 Back to back histograms

                                          The bibars function summarize the characteristics of two groups (eg males and females)on a second variable (eg age) by drawing back to back histograms (see Figure 10)

                                          25

                                          data(bfi)gt png( bibarspng )

                                          gt with(bfibibars(agegenderylab=Agemain=Age by males and females))

                                          gt devoff()

                                          null device

                                          1

                                          Figure 10 A bar plot of the age distribution for males and females shows the use ofbibars The data are males and females from 2800 cases collected using the SAPAprocedure and are available as part of the bfi data set

                                          26

                                          347 Correlational structure

                                          There are many ways to display correlations Tabular displays are probably the mostcommon The output from the cor function in core R is a rectangular matrix lowerMat

                                          will round this to (2) digits and then display as a lower off diagonal matrix lowerCor

                                          calls cor with use=lsquopairwisersquo method=lsquopearsonrsquo as default values and returns (invisibly)the full correlation matrix and displays the lower off diagonal matrix

                                          gt lowerCor(satact)

                                          gendr edctn age ACT SATV SATQ

                                          gender 100

                                          education 009 100

                                          age -002 055 100

                                          ACT -004 015 011 100

                                          SATV -002 005 -004 056 100

                                          SATQ -017 003 -003 059 064 100

                                          When comparing results from two different groups it is convenient to display them as onematrix with the results from one group below the diagonal and the other group above thediagonal Use lowerUpper to do this

                                          gt female lt- subset(satactsatact$gender==2)

                                          gt male lt- subset(satactsatact$gender==1)

                                          gt lower lt- lowerCor(male[-1])

                                          edctn age ACT SATV SATQ

                                          education 100

                                          age 061 100

                                          ACT 016 015 100

                                          SATV 002 -006 061 100

                                          SATQ 008 004 060 068 100

                                          gt upper lt- lowerCor(female[-1])

                                          edctn age ACT SATV SATQ

                                          education 100

                                          age 052 100

                                          ACT 016 008 100

                                          SATV 007 -003 053 100

                                          SATQ 003 -009 058 063 100

                                          gt both lt- lowerUpper(lowerupper)

                                          gt round(both2)

                                          education age ACT SATV SATQ

                                          education NA 052 016 007 003

                                          age 061 NA 008 -003 -009

                                          ACT 016 015 NA 053 058

                                          SATV 002 -006 061 NA 063

                                          SATQ 008 004 060 068 NA

                                          It is also possible to compare two matrices by taking their differences and displaying one (be-low the diagonal) and the difference of the second from the first above the diagonal

                                          27

                                          gt diffs lt- lowerUpper(lowerupperdiff=TRUE)

                                          gt round(diffs2)

                                          education age ACT SATV SATQ

                                          education NA 009 000 -005 005

                                          age 061 NA 007 -003 013

                                          ACT 016 015 NA 008 002

                                          SATV 002 -006 061 NA 005

                                          SATQ 008 004 060 068 NA

                                          348 Heatmap displays of correlational structure

                                          Perhaps a better way to see the structure in a correlation matrix is to display a heat mapof the correlations This is just a matrix color coded to represent the magnitude of thecorrelation This is useful when considering the number of factors in a data set Considerthe Thurstone data set which has a clear 3 factor solution (Figure 11) or a simulated dataset of 24 variables with a circumplex structure (Figure 12) The color coding representsa ldquoheat maprdquo of the correlations with darker shades of red representing stronger negativeand darker shades of blue stronger positive correlations As an option the value of thecorrelation can be shown

                                          Yet another way to show structure is to use ldquospiderrdquo plots Particularly if variables areordered in some meaningful way (eg in a circumplex) a spider plot will show this structureeasily This is just a plot of the magnitude of the correlation as a radial line with lengthranging from 0 (for a correlation of -1) to 1 (for a correlation of 1) (See Figure 13)

                                          35 Testing correlations

                                          Correlations are wonderful descriptive statistics of the data but some people like to testwhether these correlations differ from zero or differ from each other The cortest func-tion (in the stats package) will test the significance of a single correlation and the rcorr

                                          function in the Hmisc package will do this for many correlations In the psych packagethe corrtest function reports the correlation (Pearson Spearman or Kendall) betweenall variables in either one or two data frames or matrices as well as the number of obser-vations for each case and the (two-tailed) probability for each correlation Unfortunatelythese probability values have not been corrected for multiple comparisons and so shouldbe taken with a great deal of salt Thus in corrtest and corrp the raw probabilitiesare reported below the diagonal and the probabilities adjusted for multiple comparisonsusing (by default) the Holm correction are reported above the diagonal (Table 1) (See thepadjust function for a discussion of Holm (1979) and other corrections)

                                          Testing the difference between any two correlations can be done using the rtest functionThe function actually does four different tests (based upon an article by Steiger (1980)

                                          28

                                          gt png(corplotpng)gt corPlot(Thurstonenumbers=TRUEupper=FALSEdiag=FALSEmain=9 cognitive variables from Thurstone)

                                          gt devoff()

                                          null device

                                          1

                                          Figure 11 The structure of correlation matrix can be seen more clearly if the variables aregrouped by factor and then the correlations are shown by color By using the rsquonumbersrsquooption the values are displayed as well By default the complete matrix is shown Settingupper=FALSE and diag=FALSE shows a cleaner figure

                                          29

                                          gt png(circplotpng)gt circ lt- simcirc(24)

                                          gt rcirc lt- cor(circ)

                                          gt corPlot(rcircmain=24 variables in a circumplex)gt devoff()

                                          null device

                                          1

                                          Figure 12 Using the corPlot function to show the correlations in a circumplex Correlationsare highest near the diagonal diminish to zero further from the diagonal and the increaseagain towards the corners of the matrix Circumplex structures are common in the studyof affect For circumplex structures it is perhaps useful to show the complete matrix

                                          30

                                          gt png(spiderpng)gt oplt- par(mfrow=c(22))

                                          gt spider(y=c(161218)x=124data=rcircfill=TRUEmain=Spider plot of 24 circumplex variables)

                                          gt op lt- par(mfrow=c(11))

                                          gt devoff()

                                          null device

                                          1

                                          Figure 13 A spider plot can show circumplex structure very clearly Circumplex structuresare common in the study of affect

                                          31

                                          Table 1 The corrtest function reports correlations cell sizes and raw and adjustedprobability values corrp reports the probability values for a correlation matrix Bydefault the adjustment used is that of Holm (1979)gt corrtest(satact)

                                          Callcorrtest(x = satact)

                                          Correlation matrix

                                          gender education age ACT SATV SATQ

                                          gender 100 009 -002 -004 -002 -017

                                          education 009 100 055 015 005 003

                                          age -002 055 100 011 -004 -003

                                          ACT -004 015 011 100 056 059

                                          SATV -002 005 -004 056 100 064

                                          SATQ -017 003 -003 059 064 100

                                          Sample Size

                                          gender education age ACT SATV SATQ

                                          gender 700 700 700 700 700 687

                                          education 700 700 700 700 700 687

                                          age 700 700 700 700 700 687

                                          ACT 700 700 700 700 700 687

                                          SATV 700 700 700 700 700 687

                                          SATQ 687 687 687 687 687 687

                                          Probability values (Entries above the diagonal are adjusted for multiple tests)

                                          gender education age ACT SATV SATQ

                                          gender 000 017 100 100 1 0

                                          education 002 000 000 000 1 1

                                          age 058 000 000 003 1 1

                                          ACT 033 000 000 000 0 0

                                          SATV 062 022 026 000 0 0

                                          SATQ 000 036 037 000 0 0

                                          To see confidence intervals of the correlations print with the short=FALSE option

                                          32

                                          depending upon the input

                                          1) For a sample size n find the t and p value for a single correlation as well as the confidenceinterval

                                          gt rtest(503)

                                          Correlation tests

                                          Callrtest(n = 50 r12 = 03)

                                          Test of significance of a correlation

                                          t value 218 with probability lt 0034

                                          and confidence interval 002 053

                                          2) For sample sizes of n and n2 (n2 = n if not specified) find the z of the difference betweenthe z transformed correlations divided by the standard error of the difference of two zscores

                                          gt rtest(3046)

                                          Correlation tests

                                          Callrtest(n = 30 r12 = 04 r34 = 06)

                                          Test of difference between two independent correlations

                                          z value 099 with probability 032

                                          3) For sample size n and correlations ra= r12 rb= r23 and r13 specified test for thedifference of two dependent correlations (Steiger case A)

                                          gt rtest(103451)

                                          Correlation tests

                                          Call[1] rtest(n = 103 r12 = 04 r23 = 01 r13 = 05 )

                                          Test of difference between two correlated correlations

                                          t value -089 with probability lt 037

                                          4) For sample size n test for the difference between two dependent correlations involvingdifferent variables (Steiger case B)

                                          gt rtest(103567558) steiger Case B

                                          Correlation tests

                                          Callrtest(n = 103 r12 = 05 r34 = 06 r23 = 07 r13 = 05 r14 = 05

                                          r24 = 08)

                                          Test of difference between two dependent correlations

                                          z value -12 with probability 023

                                          To test whether a matrix of correlations differs from what would be expected if the popu-lation correlations were all zero the function cortest follows Steiger (1980) who pointedout that the sum of the squared elements of a correlation matrix or the Fisher z scoreequivalents is distributed as chi square under the null hypothesis that the values are zero(ie elements of the identity matrix) This is particularly useful for examining whethercorrelations in a single matrix differ from zero or for comparing two matrices Althoughobvious cortest can be used to test whether the satact data matrix produces non-zerocorrelations (it does) This is a much more appropriate test when testing whether a residualmatrix differs from zero

                                          gt cortest(satact)

                                          33

                                          Tests of correlation matrices

                                          Callcortest(R1 = satact)

                                          Chi Square value 132542 with df = 15 with probability lt 18e-273

                                          36 Polychoric tetrachoric polyserial and biserial correlations

                                          The Pearson correlation of dichotomous data is also known as the φ coefficient If thedata eg ability items are thought to represent an underlying continuous although latentvariable the φ will underestimate the value of the Pearson applied to these latent variablesOne solution to this problem is to use the tetrachoric correlation which is based uponthe assumption of a bivariate normal distribution that has been cut at certain points Thedrawtetra function demonstrates the process (Figure 14) This is also shown in termsof dichotomizing the bivariate normal density function using the drawcor function (Fig-ure 15) A simple generalization of this to the case of the multiple cuts is the polychoric

                                          correlation

                                          Other estimated correlations based upon the assumption of bivariate normality with cutpoints include the biserial and polyserial correlation

                                          If the data are a mix of continuous polytomous and dichotomous variables the mixedcor

                                          function will calculate the appropriate mixture of Pearson polychoric tetrachoric biserialand polyserial correlations

                                          The correlation matrix resulting from a number of tetrachoric or polychoric correlationmatrix sometimes will not be positive semi-definite This will sometimes happen if thecorrelation matrix is formed by using pair-wise deletion of cases The corsmooth functionwill adjust the smallest eigen values of the correlation matrix to make them positive rescaleall of them to sum to the number of variables and produce aldquosmoothedrdquocorrelation matrixAn example of this problem is a data set of burt which probably had a typo in the originalcorrelation matrix Smoothing the matrix corrects this problem

                                          4 Multilevel modeling

                                          Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular use

                                          34

                                          gt drawtetra()

                                          minus3 minus2 minus1 0 1 2 3

                                          minus3

                                          minus2

                                          minus1

                                          01

                                          23

                                          Y rho = 05phi = 033

                                          X gt τY gt Τ

                                          X lt τY gt Τ

                                          X gt τY lt Τ

                                          X lt τY lt Τ

                                          x

                                          dnor

                                          m(x

                                          )

                                          X gt τ

                                          τ

                                          x1

                                          Y gt Τ

                                          Τ

                                          Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values

                                          35

                                          gt drawcor(expand=20cuts=c(00))

                                          xy

                                          z

                                          Bivariate density rho = 05

                                          Figure 15 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values It isfound (laboriously) by optimizing the fit of the bivariate normal for various values of thecorrelation to the observed cell frequencies

                                          36

                                          is the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups

                                          41 Decomposing data into within and between level correlations usingstatsBy

                                          There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler functionto give some of the basic descriptive statistics for two level models

                                          This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package

                                          rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)

                                          where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means

                                          42 Generating and displaying multilevel data

                                          withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a betweengroup correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data

                                          simmultilevel will generate simulated data with a multilevel structure

                                          The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary

                                          function specifying the variable of interest

                                          37

                                          Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )

                                          43 Factor analysis by groups

                                          Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output

                                          sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)

                                          faBy(sbnfactors=5) find the 5 factor solution for each education level

                                          5 Multiple Regression mediation moderation and set cor-relations

                                          The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices

                                          51 Multiple regression from data or correlation matrices

                                          The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstonecorrelation matrix and find the multiple correlation of the last five variables as a functionof the first 4

                                          gt setCor(y = 59x=14data=Thurstone)

                                          Call setCor(y = 59 x = 14 data = Thurstone)

                                          Multiple Regression from matrix input

                                          Beta weights

                                          FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

                                          Sentences 009 007 025 021 020

                                          Vocabulary 009 017 009 016 -002

                                          SentCompletion 002 005 004 021 008

                                          FirstLetters 058 045 021 008 031

                                          38

                                          Multiple R

                                          FourLetterWords Suffixes LetterSeries Pedigrees

                                          069 063 050 058

                                          LetterGroup

                                          048

                                          multiple R2

                                          FourLetterWords Suffixes LetterSeries Pedigrees

                                          048 040 025 034

                                          LetterGroup

                                          023

                                          Multiple Inflation Factor (VIF) = 1(1-SMC) =

                                          Sentences Vocabulary SentCompletion FirstLetters

                                          369 388 300 135

                                          Unweighted multiple R

                                          FourLetterWords Suffixes LetterSeries Pedigrees

                                          059 058 049 058

                                          LetterGroup

                                          045

                                          Unweighted multiple R2

                                          FourLetterWords Suffixes LetterSeries Pedigrees

                                          034 034 024 033

                                          LetterGroup

                                          020

                                          Various estimates of between set correlations

                                          Squared Canonical Correlations

                                          [1] 06280 01478 00076 00049

                                          Average squared canonical correlation = 02

                                          Cohens Set Correlation R2 = 069

                                          Unweighted correlation between the two sets = 073

                                          By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed

                                          gt sc lt- setCor(y = 59x=34data=Thurstonez=12)

                                          Call setCor(y = 59 x = 34 data = Thurstone z = 12)

                                          Multiple Regression from matrix input

                                          Beta weights

                                          FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

                                          SentCompletion 002 005 004 021 008

                                          FirstLetters 058 045 021 008 031

                                          Multiple R

                                          FourLetterWords Suffixes LetterSeries Pedigrees

                                          058 046 021 018

                                          LetterGroup

                                          030

                                          39

                                          multiple R2

                                          FourLetterWords Suffixes LetterSeries Pedigrees

                                          0331 0210 0043 0032

                                          LetterGroup

                                          0092

                                          Multiple Inflation Factor (VIF) = 1(1-SMC) =

                                          SentCompletion FirstLetters

                                          102 102

                                          Unweighted multiple R

                                          FourLetterWords Suffixes LetterSeries Pedigrees

                                          044 035 017 014

                                          LetterGroup

                                          026

                                          Unweighted multiple R2

                                          FourLetterWords Suffixes LetterSeries Pedigrees

                                          019 012 003 002

                                          LetterGroup

                                          007

                                          Various estimates of between set correlations

                                          Squared Canonical Correlations

                                          [1] 0405 0023

                                          Average squared canonical correlation = 021

                                          Cohens Set Correlation R2 = 042

                                          Unweighted correlation between the two sets = 048

                                          gt round(sc$residual2)

                                          FourLetterWords Suffixes LetterSeries Pedigrees

                                          FourLetterWords 052 011 009 006

                                          Suffixes 011 060 -001 001

                                          LetterSeries 009 -001 075 028

                                          Pedigrees 006 001 028 066

                                          LetterGroup 013 003 037 020

                                          LetterGroup

                                          FourLetterWords 013

                                          Suffixes 003

                                          LetterSeries 037

                                          Pedigrees 020

                                          LetterGroup 077

                                          52 Mediation and Moderation analysis

                                          Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping

                                          40

                                          Consider the example from Preacher and Hayes (2004) as analyzed using the mediate

                                          function and the subsequent graphic from mediatediagram The data are found in theexample for mediate

                                          Call mediate(y = SATIS x = THERAPY m = ATTRIB data = sobel)

                                          The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB

                                          Total Direct effect(c) of THERAPY on SATIS = 076 SE = 031 t direct = 25 with probability = 0019

                                          Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t direct = 135 with probability = 019

                                          Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033

                                          Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 004 Upper CI = 069

                                          R2 of model = 031

                                          To see the longer output specify short = FALSE in the print statement

                                          Full output

                                          Total effect estimates (c)

                                          SATIS se t Prob

                                          THERAPY 076 031 25 00186

                                          Direct effect estimates (c)SATIS se t Prob

                                          THERAPY 043 032 135 0190

                                          ATTRIB 040 018 223 0034

                                          a effect estimates

                                          THERAPY se t Prob

                                          ATTRIB 082 03 274 00106

                                          b effect estimates

                                          SATIS se t Prob

                                          ATTRIB 04 018 223 0034

                                          ab effect estimates

                                          SATIS boot sd lower upper

                                          THERAPY 033 032 017 004 069

                                          bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables

                                          setCor(y = c( SATV SATQ) x = c(education age ) data = satact std=TRUE)

                                          bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap

                                          mediate(y = c( SATV ) x = c(education age ) m= ACT data =satactstd=TRUEniter=50)

                                          bull mediate will take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It then tests the mediation effect using a bootstrap The particular example is for demonstration purposes only and shows neithermoderation nor mediation The number of iterations for the boot strap was set to 50

                                          41

                                          gt mediatediagram(preacher)

                                          Mediation model

                                          THERAPY SATIS

                                          ATTRIB

                                          082

                                          c = 076

                                          c = 043

                                          04

                                          Figure 16 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram

                                          42

                                          gt preacher lt- setCor(1c(23)sobelstd=FALSE)

                                          gt setCordiagram(preacher)

                                          Regression Models

                                          THERAPY

                                          ATTRIB

                                          SATIS

                                          043

                                          04

                                          021

                                          Figure 17 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure

                                          43

                                          for speed The default number of boot straps is 5000

                                          53 Set Correlation

                                          An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor

                                          function Set correlation is

                                          R2 = 1minusn

                                          prodi=1

                                          (1minusλi)

                                          where λi is the ith eigen value of the eigen value decomposition of the matrix

                                          R = Rminus1xx RxyRminus1

                                          xx Rminus1xy

                                          Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate

                                          setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data

                                          Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things tonotice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied

                                          For this example the analysis is done on the correlation matrix rather than the rawdata

                                          gt C lt- cov(satactuse=pairwise)

                                          gt model1 lt- lm(ACT~ gender + education + age data=satact)

                                          gt summary(model1)

                                          Call

                                          lm(formula = ACT ~ gender + education + age data = satact)

                                          Residuals

                                          44

                                          Call mediate(y = c(SATQ) x = c(ACT) m = education data = satact

                                          mod = gender niter = 50 std = TRUE)

                                          The DV (Y) was SATQ The IV (X) was ACT gender ACTXgndr The mediating variable(s) = education

                                          Total Direct effect(c) of ACT on SATQ = 058 SE = 003 t direct = 1925 with probability = 0

                                          Direct effect (c) of ACT on SATQ removing education = 059 SE = 003 t direct = 1926 with probability = 0

                                          Indirect effect (ab) of ACT on SATQ through education = -001

                                          Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -002 Upper CI = 0

                                          Total Direct effect(c) of gender on SATQ = -014 SE = 003 t direct = -478 with probability = 21e-06

                                          Direct effect (c) of gender on NA removing education = -014 SE = 003 t direct = -463 with probability = 44e-06

                                          Indirect effect (ab) of gender on SATQ through education = 0

                                          Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -001 Upper CI = 0

                                          Total Direct effect(c) of ACTXgndr on SATQ = 0 SE = 003 t direct = 002 with probability = 099

                                          Direct effect (c) of ACTXgndr on NA removing education = 0 SE = 003 t direct = 001 with probability = 099

                                          Indirect effect (ab) of ACTXgndr on SATQ through education = 0

                                          Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = 0 Upper CI = 0

                                          R2 of model = 037

                                          To see the longer output specify short = FALSE in the print statement

                                          Full output

                                          Total effect estimates (c)

                                          SATQ se t Prob

                                          ACT 058 003 1925 000e+00

                                          gender -014 003 -478 210e-06

                                          ACTXgndr 000 003 002 985e-01

                                          Direct effect estimates (c)SATQ se t Prob

                                          ACT 059 003 1926 000e+00

                                          gender -014 003 -463 437e-06

                                          ACTXgndr 000 003 001 992e-01

                                          a effect estimates

                                          education se t Prob

                                          ACT 016 004 422 277e-05

                                          gender 009 004 250 128e-02

                                          ACTXgndr -001 004 -015 883e-01

                                          b effect estimates

                                          SATQ se t Prob

                                          education -004 003 -145 0147

                                          ab effect estimates

                                          SATQ boot sd lower upper

                                          ACT -001 -001 001 0 0

                                          gender 000 000 000 0 0

                                          ACTXgndr 000 000 000 0 0

                                          Moderation model

                                          ACT

                                          gender

                                          ACTXgndr

                                          SATQ

                                          education016 c = 058

                                          c = 059

                                          009 c = minus014

                                          c = minus014

                                          minus001 c = 0

                                          c = 0

                                          minus004

                                          minus004

                                          minus007

                                          002

                                          Figure 18 Moderated multiple regression requires the raw data

                                          45

                                          Min 1Q Median 3Q Max

                                          -252458 -32133 07769 35921 92630

                                          Coefficients

                                          Estimate Std Error t value Pr(gt|t|)

                                          (Intercept) 2741706 082140 33378 lt 2e-16

                                          gender -048606 037984 -1280 020110

                                          education 047890 015235 3143 000174

                                          age 001623 002278 0712 047650

                                          ---

                                          Signif codes 0 0001 001 005 01 1

                                          Residual standard error 4768 on 696 degrees of freedom

                                          Multiple R-squared 00272 Adjusted R-squared 002301

                                          F-statistic 6487 on 3 and 696 DF p-value 00002476

                                          Compare this with the output from setCor

                                          gt compare with sector

                                          gt setCor(c(46)c(13)C nobs=700)

                                          Call setCor(y = c(46) x = c(13) data = C nobs = 700)

                                          Multiple Regression from matrix input

                                          Beta weights

                                          ACT SATV SATQ

                                          gender -005 -003 -018

                                          education 014 010 010

                                          age 003 -010 -009

                                          Multiple R

                                          ACT SATV SATQ

                                          016 010 019

                                          multiple R2

                                          ACT SATV SATQ

                                          00272 00096 00359

                                          Multiple Inflation Factor (VIF) = 1(1-SMC) =

                                          gender education age

                                          101 145 144

                                          Unweighted multiple R

                                          ACT SATV SATQ

                                          015 005 011

                                          Unweighted multiple R2

                                          ACT SATV SATQ

                                          002 000 001

                                          SE of Beta weights

                                          ACT SATV SATQ

                                          gender 018 429 434

                                          education 022 513 518

                                          age 022 511 516

                                          t of Beta Weights

                                          ACT SATV SATQ

                                          gender -027 -001 -004

                                          education 065 002 002

                                          46

                                          age 015 -002 -002

                                          Probability of t lt

                                          ACT SATV SATQ

                                          gender 079 099 097

                                          education 051 098 098

                                          age 088 098 099

                                          Shrunken R2

                                          ACT SATV SATQ

                                          00230 00054 00317

                                          Standard Error of R2

                                          ACT SATV SATQ

                                          00120 00073 00137

                                          F

                                          ACT SATV SATQ

                                          649 226 863

                                          Probability of F lt

                                          ACT SATV SATQ

                                          248e-04 808e-02 124e-05

                                          degrees of freedom of regression

                                          [1] 3 696

                                          Various estimates of between set correlations

                                          Squared Canonical Correlations

                                          [1] 0050 0033 0008

                                          Chisq of canonical correlations

                                          [1] 358 231 56

                                          Average squared canonical correlation = 003

                                          Cohens Set Correlation R2 = 009

                                          Shrunken Set Correlation R2 = 008

                                          F and df of Cohens Set Correlation 726 9 168186

                                          Unweighted correlation between the two sets = 001

                                          Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

                                          6 Converting output to APA style tables using LATEX

                                          Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenient

                                          47

                                          LATEXoutput and finally df2latex converts a generic data frame to LATEX

                                          An example of converting the output from fa to LATEXappears in Table 2

                                          Table 2 fa2latexA factor analysis table from the psych package in R

                                          Variable MR1 MR2 MR3 h2 u2 com

                                          Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

                                          SS loadings 264 186 15

                                          MR1 100 059 054MR2 059 100 052MR3 054 052 100

                                          48

                                          7 Miscellaneous functions

                                          A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

                                          blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

                                          df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

                                          scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

                                          cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

                                          cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

                                          dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

                                          fisherz Convert a correlation to the corresponding Fisher z score

                                          geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

                                          ICC and cohenkappa are typically used to find the reliability for raters

                                          headtail combines the head and tail functions to show the first and last lines of a dataset or output

                                          topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

                                          mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

                                          prep finds the probability of replication for an F t or r and estimate effect size

                                          partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

                                          rangeCorrection will correct correlations for restriction of range

                                          reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

                                          49

                                          superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

                                          8 Data sets

                                          A number of data sets for demonstrating psychometric techniques are included in thepsych package These include six data sets showing a hierarchical factor structure (fivecognitive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2and one from health psychology Reise) One of these (Thurstone) is used as an examplein the sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) or 13 personality inventory scores (epibfi) and14 multiple choice iq items (iqitems) The vegetables example has paired comparisonpreferences for 9 vegetables This is an example of Thurstonian scaling used by Guilford(1954) and Nunnally (1967) Other data sets include cubits peas and heights fromGalton

                                          Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

                                          bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

                                          satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

                                          epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

                                          50

                                          iq 14 multiple choice ability items were included as part of the Synthetic Aperture Person-ality Assessment (SAPA) web based personality assessment project The data from1000 subjects are included here as a demonstration set for scoring multiple choiceinventories and doing basic item statistics

                                          galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to ntroduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

                                          Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

                                          miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

                                          9 Development version and a users guide

                                          The most recent development version is available as a source file at the repository main-tained at httppersonality-projectorgr That version will have removed the mostrecently discovered bugs (but perhaps introduced other yet to be discovered ones) Todownload that version go to the repository httppersonality-projectorgrsrc

                                          contrib and wander around For a Mac this version can be installed directly using theldquoother repositoryrdquo option in the package installer For a PC the zip file for the most recentrelease has been created using the win-builder facility at CRAN The development releasefor the Mac is usually several weeks ahead of the PC development version

                                          Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httppersonality-projectorgrpsych_manualpdf

                                          News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

                                          gt news(Version gt 170package=psych)

                                          51

                                          10 Psychometric Theory

                                          The psych package has been developed to help psychologists do basic research Many ofthe functions were developed to supplement a book (httppersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep)More information about the use of some of the functions may be found in the book

                                          For more extensive discussion of the use of psych in particular and R in general consulthttppersonality-projectorgrrguidehtml A short guide to R

                                          11 SessionInfo

                                          This document was prepared using the following settings

                                          gt sessionInfo()

                                          R Under development (unstable) (2017-03-05 r72309)

                                          Platform x86_64-apple-darwin1340 (64-bit)

                                          Running under macOS Sierra 10124

                                          Matrix products default

                                          BLAS LibraryFrameworksRframeworkVersions34ResourcesliblibRblas0dylib

                                          LAPACK LibraryFrameworksRframeworkVersions34ResourcesliblibRlapackdylib

                                          locale

                                          [1] C

                                          attached base packages

                                          [1] stats graphics grDevices utils datasets methods base

                                          other attached packages

                                          [1] psych_17421

                                          loaded via a namespace (and not attached)

                                          [1] compiler_340 parallel_340 tools_340 foreign_08-67

                                          [5] KernSmooth_223-15 nlme_31-131 mnormt_15-4 grid_340

                                          [9] lattice_020-34

                                          52

                                          References

                                          Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

                                          Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

                                          Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

                                          Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

                                          Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

                                          Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

                                          Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

                                          Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

                                          Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

                                          Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

                                          Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

                                          Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

                                          Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

                                          Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

                                          Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

                                          53

                                          Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

                                          Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

                                          Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

                                          Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

                                          Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

                                          Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

                                          Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

                                          Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

                                          Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

                                          Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

                                          MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

                                          Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

                                          McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

                                          Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

                                          Nunnally J C (1967) Psychometric theory McGraw-Hill New York

                                          54

                                          Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

                                          3rd edition

                                          Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

                                          Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

                                          Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

                                          Revelle W (2015) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 158

                                          Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

                                          Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

                                          Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

                                          Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

                                          Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook ofIndividual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

                                          Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

                                          Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

                                          Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

                                          Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bang

                                          55

                                          for the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

                                          Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

                                          Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

                                          Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

                                          Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

                                          Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

                                          Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

                                          Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

                                          Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

                                          Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

                                          Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

                                          Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

                                          56

                                          Index

                                          affect 14 24alpha 5 6

                                          Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26bifactor 6biserial 13 34blockrandom 49burt 34

                                          char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49circular statistics 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49ctv 7cubits 50

                                          densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49diagram 8drawcor 34drawtetra 34dummycode 13

                                          dynamite plot 19

                                          edit 3epibfi 50error bars 19errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23

                                          fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factor analysis 6factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49

                                          galton 51generalized least squares 6geometricmean 49GPArotation 7guttman 6

                                          harmonicmean 49head 49headtail 49heights 50hetdiagram 7Hmisc 28Holzinger 50

                                          57

                                          ICC 6 49iclust 6iclustdiagram 7Index 49introduction to psychometric theory with ap-

                                          plications in R 7iqitems 50irtfa 6 47irt2latex 47 49

                                          KnitR 47

                                          lavaan 38library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27lowess 14

                                          makekeys 14MAP 6mardia 49maximum likelihood 6mediate 4 41 42mediatediagram 41minimum residual 6mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevel 37multilevelreliability 6multiple regression 38

                                          nfactors 6nlme 37

                                          omega 6 7outlier 3 11 12

                                          padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7principal axis 6psych 3 5ndash8 28 47 49ndash52

                                          R functionaffect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50

                                          58

                                          densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13edit 3epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49head 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47

                                          irt2latex 47 49library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7psych 51psych package

                                          affect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26

                                          59

                                          biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49corsmooth 34cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7

                                          fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47irt2latex 47 49lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12prep 49pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34

                                          60

                                          polyserial 34principal 5ndash7psych 51rtest 28rangeCorrection 49readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

                                          rtest 28

                                          rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50table 19tail 49tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

                                          R package

                                          61

                                          ctv 7GPArotation 7Hmisc 28KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash8 28 47 49ndash52Rgraphviz 8sem 7 50stats 28Sweave 47xtable 47

                                          rtest 28rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49Rgraphviz 8

                                          SAPA 26 50 51satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43sem 7 50set correlation 44setcor 49setCor 4 38 41 44 46 47simmultilevel 37

                                          spider 13stars 14stats 28StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50Sweave 47

                                          table 19tail 49tetrachoric 6 34Thurstone 28 38 50Thurstone33 50topBottom 49

                                          vegetables 50 51violinBy 14 18vss 5 6

                                          weighted least squares 6withinBetween 37

                                          xtable 47

                                          62

                                          • Jump starting the psych packagendasha guide for the impatient
                                          • Psychometric functions are summarized in the second vignette
                                          • Overview of this and related documents
                                          • Getting started
                                          • Basic data analysis
                                            • Getting the data by using readfile
                                            • Data input from the clipboard
                                            • Basic descriptive statistics
                                              • Outlier detection using outlier
                                              • Basic data cleaning using scrub
                                              • Recoding categorical variables into dummy coded variables
                                                • Simple descriptive graphics
                                                  • Scatter Plot Matrices
                                                  • Density or violin plots
                                                  • Means and error bars
                                                  • Error bars for tabular data
                                                  • Two dimensional displays of means and errors
                                                  • Back to back histograms
                                                  • Correlational structure
                                                  • Heatmap displays of correlational structure
                                                    • Testing correlations
                                                    • Polychoric tetrachoric polyserial and biserial correlations
                                                      • Multilevel modeling
                                                        • Decomposing data into within and between level correlations using statsBy
                                                        • Generating and displaying multilevel data
                                                        • Factor analysis by groups
                                                          • Multiple Regression mediation moderation and set correlations
                                                            • Multiple regression from data or correlation matrices
                                                            • Mediation and Moderation analysis
                                                            • Set Correlation
                                                              • Converting output to APA style tables using LaTeX
                                                              • Miscellaneous functions
                                                              • Data sets
                                                              • Development version and a users guide
                                                              • Psychometric Theory
                                                              • SessionInfo

                                            gt T lt- with(satacttable(gendereducation))

                                            gt rownames(T) lt- c(MF)

                                            gt errorbarstab(Tway=bothylab=Proportion of Education Levelxlab=Level of Education

                                            + main=Proportion of sample by education level)

                                            Proportion of sample by education level

                                            Level of Education

                                            Pro

                                            port

                                            ion

                                            of E

                                            duca

                                            tion

                                            Leve

                                            l

                                            000

                                            005

                                            010

                                            015

                                            020

                                            025

                                            030

                                            M 0 M 1 M 2 M 3 M 4 M 5

                                            000

                                            005

                                            010

                                            015

                                            020

                                            025

                                            030

                                            Figure 8 The proportion of each education level that is Male or Female By using theway=rdquobothrdquo option the percentages and errors are based upon the grand total Alterna-tively way=rdquocolumnsrdquo finds column wise percentages way=rdquorowsrdquo finds rowwise percent-ages The data can be converted to percentages (as shown) or by total count (raw=TRUE)The function invisibly returns the probabilities and standard errors See the help menu foran example of entering the data as a dataframe

                                            22

                                            345 Two dimensional displays of means and errors

                                            Yet another way to display data for different conditions is to use the errorCrosses func-tion For instance the effect of various movies on both ldquoEnergetic Arousalrdquo and ldquoTenseArousalrdquo can be seen in one graph and compared to the same movie manipulations onldquoPositive Affectrdquo and ldquoNegative Affectrdquo Note how Energetic Arousal is increased by threeof the movie manipulations but that Positive Affect increases following the Happy movieonly

                                            23

                                            gt op lt- par(mfrow=c(12))

                                            gt data(affect)

                                            gt colors lt- c(blackredwhiteblue)

                                            gt films lt- c(SadHorrorNeutralHappy)

                                            gt affectstats lt- errorCircles(EA2TA2data=affect[-c(120)]group=Filmlabels=films

                                            + xlab=Energetic Arousal ylab=Tense Arousalylim=c(1022)xlim=c(820)pch=16

                                            + cex=2colors=colors main = Movies effect on arousal)gt errorCircles(PA2NA2data=affectstatslabels=filmsxlab=Positive Affect

                                            + ylab=Negative Affect pch=16cex=2colors=colors main =Movies effect on affect)

                                            gt op lt- par(mfrow=c(11))

                                            8 12 16 20

                                            1012

                                            1416

                                            1820

                                            22

                                            Movies effect on arousal

                                            Energetic Arousal

                                            Tens

                                            e A

                                            rous

                                            al

                                            SadHorror

                                            NeutralHappy

                                            6 8 10 12

                                            24

                                            68

                                            10

                                            Movies effect on affect

                                            Positive Affect

                                            Neg

                                            ativ

                                            e A

                                            ffect

                                            Sad

                                            Horror

                                            NeutralHappy

                                            Figure 9 The use of the errorCircles function allows for two dimensional displays ofmeans and error bars The first call to errorCircles finds descriptive statistics for theaffect dataframe based upon the grouping variable of Film These data are returned andthen used by the second call which examines the effect of the same grouping variable upondifferent measures The size of the circles represent the relative sample sizes for each groupThe data are from the PMC lab and reported in Smillie et al (2012)

                                            24

                                            346 Back to back histograms

                                            The bibars function summarize the characteristics of two groups (eg males and females)on a second variable (eg age) by drawing back to back histograms (see Figure 10)

                                            25

                                            data(bfi)gt png( bibarspng )

                                            gt with(bfibibars(agegenderylab=Agemain=Age by males and females))

                                            gt devoff()

                                            null device

                                            1

                                            Figure 10 A bar plot of the age distribution for males and females shows the use ofbibars The data are males and females from 2800 cases collected using the SAPAprocedure and are available as part of the bfi data set

                                            26

                                            347 Correlational structure

                                            There are many ways to display correlations Tabular displays are probably the mostcommon The output from the cor function in core R is a rectangular matrix lowerMat

                                            will round this to (2) digits and then display as a lower off diagonal matrix lowerCor

                                            calls cor with use=lsquopairwisersquo method=lsquopearsonrsquo as default values and returns (invisibly)the full correlation matrix and displays the lower off diagonal matrix

                                            gt lowerCor(satact)

                                            gendr edctn age ACT SATV SATQ

                                            gender 100

                                            education 009 100

                                            age -002 055 100

                                            ACT -004 015 011 100

                                            SATV -002 005 -004 056 100

                                            SATQ -017 003 -003 059 064 100

                                            When comparing results from two different groups it is convenient to display them as onematrix with the results from one group below the diagonal and the other group above thediagonal Use lowerUpper to do this

                                            gt female lt- subset(satactsatact$gender==2)

                                            gt male lt- subset(satactsatact$gender==1)

                                            gt lower lt- lowerCor(male[-1])

                                            edctn age ACT SATV SATQ

                                            education 100

                                            age 061 100

                                            ACT 016 015 100

                                            SATV 002 -006 061 100

                                            SATQ 008 004 060 068 100

                                            gt upper lt- lowerCor(female[-1])

                                            edctn age ACT SATV SATQ

                                            education 100

                                            age 052 100

                                            ACT 016 008 100

                                            SATV 007 -003 053 100

                                            SATQ 003 -009 058 063 100

                                            gt both lt- lowerUpper(lowerupper)

                                            gt round(both2)

                                            education age ACT SATV SATQ

                                            education NA 052 016 007 003

                                            age 061 NA 008 -003 -009

                                            ACT 016 015 NA 053 058

                                            SATV 002 -006 061 NA 063

                                            SATQ 008 004 060 068 NA

                                            It is also possible to compare two matrices by taking their differences and displaying one (be-low the diagonal) and the difference of the second from the first above the diagonal

                                            27

                                            gt diffs lt- lowerUpper(lowerupperdiff=TRUE)

                                            gt round(diffs2)

                                            education age ACT SATV SATQ

                                            education NA 009 000 -005 005

                                            age 061 NA 007 -003 013

                                            ACT 016 015 NA 008 002

                                            SATV 002 -006 061 NA 005

                                            SATQ 008 004 060 068 NA

                                            348 Heatmap displays of correlational structure

                                            Perhaps a better way to see the structure in a correlation matrix is to display a heat mapof the correlations This is just a matrix color coded to represent the magnitude of thecorrelation This is useful when considering the number of factors in a data set Considerthe Thurstone data set which has a clear 3 factor solution (Figure 11) or a simulated dataset of 24 variables with a circumplex structure (Figure 12) The color coding representsa ldquoheat maprdquo of the correlations with darker shades of red representing stronger negativeand darker shades of blue stronger positive correlations As an option the value of thecorrelation can be shown

                                            Yet another way to show structure is to use ldquospiderrdquo plots Particularly if variables areordered in some meaningful way (eg in a circumplex) a spider plot will show this structureeasily This is just a plot of the magnitude of the correlation as a radial line with lengthranging from 0 (for a correlation of -1) to 1 (for a correlation of 1) (See Figure 13)

                                            35 Testing correlations

                                            Correlations are wonderful descriptive statistics of the data but some people like to testwhether these correlations differ from zero or differ from each other The cortest func-tion (in the stats package) will test the significance of a single correlation and the rcorr

                                            function in the Hmisc package will do this for many correlations In the psych packagethe corrtest function reports the correlation (Pearson Spearman or Kendall) betweenall variables in either one or two data frames or matrices as well as the number of obser-vations for each case and the (two-tailed) probability for each correlation Unfortunatelythese probability values have not been corrected for multiple comparisons and so shouldbe taken with a great deal of salt Thus in corrtest and corrp the raw probabilitiesare reported below the diagonal and the probabilities adjusted for multiple comparisonsusing (by default) the Holm correction are reported above the diagonal (Table 1) (See thepadjust function for a discussion of Holm (1979) and other corrections)

                                            Testing the difference between any two correlations can be done using the rtest functionThe function actually does four different tests (based upon an article by Steiger (1980)

                                            28

                                            gt png(corplotpng)gt corPlot(Thurstonenumbers=TRUEupper=FALSEdiag=FALSEmain=9 cognitive variables from Thurstone)

                                            gt devoff()

                                            null device

                                            1

                                            Figure 11 The structure of correlation matrix can be seen more clearly if the variables aregrouped by factor and then the correlations are shown by color By using the rsquonumbersrsquooption the values are displayed as well By default the complete matrix is shown Settingupper=FALSE and diag=FALSE shows a cleaner figure

                                            29

                                            gt png(circplotpng)gt circ lt- simcirc(24)

                                            gt rcirc lt- cor(circ)

                                            gt corPlot(rcircmain=24 variables in a circumplex)gt devoff()

                                            null device

                                            1

                                            Figure 12 Using the corPlot function to show the correlations in a circumplex Correlationsare highest near the diagonal diminish to zero further from the diagonal and the increaseagain towards the corners of the matrix Circumplex structures are common in the studyof affect For circumplex structures it is perhaps useful to show the complete matrix

                                            30

                                            gt png(spiderpng)gt oplt- par(mfrow=c(22))

                                            gt spider(y=c(161218)x=124data=rcircfill=TRUEmain=Spider plot of 24 circumplex variables)

                                            gt op lt- par(mfrow=c(11))

                                            gt devoff()

                                            null device

                                            1

                                            Figure 13 A spider plot can show circumplex structure very clearly Circumplex structuresare common in the study of affect

                                            31

                                            Table 1 The corrtest function reports correlations cell sizes and raw and adjustedprobability values corrp reports the probability values for a correlation matrix Bydefault the adjustment used is that of Holm (1979)gt corrtest(satact)

                                            Callcorrtest(x = satact)

                                            Correlation matrix

                                            gender education age ACT SATV SATQ

                                            gender 100 009 -002 -004 -002 -017

                                            education 009 100 055 015 005 003

                                            age -002 055 100 011 -004 -003

                                            ACT -004 015 011 100 056 059

                                            SATV -002 005 -004 056 100 064

                                            SATQ -017 003 -003 059 064 100

                                            Sample Size

                                            gender education age ACT SATV SATQ

                                            gender 700 700 700 700 700 687

                                            education 700 700 700 700 700 687

                                            age 700 700 700 700 700 687

                                            ACT 700 700 700 700 700 687

                                            SATV 700 700 700 700 700 687

                                            SATQ 687 687 687 687 687 687

                                            Probability values (Entries above the diagonal are adjusted for multiple tests)

                                            gender education age ACT SATV SATQ

                                            gender 000 017 100 100 1 0

                                            education 002 000 000 000 1 1

                                            age 058 000 000 003 1 1

                                            ACT 033 000 000 000 0 0

                                            SATV 062 022 026 000 0 0

                                            SATQ 000 036 037 000 0 0

                                            To see confidence intervals of the correlations print with the short=FALSE option

                                            32

                                            depending upon the input

                                            1) For a sample size n find the t and p value for a single correlation as well as the confidenceinterval

                                            gt rtest(503)

                                            Correlation tests

                                            Callrtest(n = 50 r12 = 03)

                                            Test of significance of a correlation

                                            t value 218 with probability lt 0034

                                            and confidence interval 002 053

                                            2) For sample sizes of n and n2 (n2 = n if not specified) find the z of the difference betweenthe z transformed correlations divided by the standard error of the difference of two zscores

                                            gt rtest(3046)

                                            Correlation tests

                                            Callrtest(n = 30 r12 = 04 r34 = 06)

                                            Test of difference between two independent correlations

                                            z value 099 with probability 032

                                            3) For sample size n and correlations ra= r12 rb= r23 and r13 specified test for thedifference of two dependent correlations (Steiger case A)

                                            gt rtest(103451)

                                            Correlation tests

                                            Call[1] rtest(n = 103 r12 = 04 r23 = 01 r13 = 05 )

                                            Test of difference between two correlated correlations

                                            t value -089 with probability lt 037

                                            4) For sample size n test for the difference between two dependent correlations involvingdifferent variables (Steiger case B)

                                            gt rtest(103567558) steiger Case B

                                            Correlation tests

                                            Callrtest(n = 103 r12 = 05 r34 = 06 r23 = 07 r13 = 05 r14 = 05

                                            r24 = 08)

                                            Test of difference between two dependent correlations

                                            z value -12 with probability 023

                                            To test whether a matrix of correlations differs from what would be expected if the popu-lation correlations were all zero the function cortest follows Steiger (1980) who pointedout that the sum of the squared elements of a correlation matrix or the Fisher z scoreequivalents is distributed as chi square under the null hypothesis that the values are zero(ie elements of the identity matrix) This is particularly useful for examining whethercorrelations in a single matrix differ from zero or for comparing two matrices Althoughobvious cortest can be used to test whether the satact data matrix produces non-zerocorrelations (it does) This is a much more appropriate test when testing whether a residualmatrix differs from zero

                                            gt cortest(satact)

                                            33

                                            Tests of correlation matrices

                                            Callcortest(R1 = satact)

                                            Chi Square value 132542 with df = 15 with probability lt 18e-273

                                            36 Polychoric tetrachoric polyserial and biserial correlations

                                            The Pearson correlation of dichotomous data is also known as the φ coefficient If thedata eg ability items are thought to represent an underlying continuous although latentvariable the φ will underestimate the value of the Pearson applied to these latent variablesOne solution to this problem is to use the tetrachoric correlation which is based uponthe assumption of a bivariate normal distribution that has been cut at certain points Thedrawtetra function demonstrates the process (Figure 14) This is also shown in termsof dichotomizing the bivariate normal density function using the drawcor function (Fig-ure 15) A simple generalization of this to the case of the multiple cuts is the polychoric

                                            correlation

                                            Other estimated correlations based upon the assumption of bivariate normality with cutpoints include the biserial and polyserial correlation

                                            If the data are a mix of continuous polytomous and dichotomous variables the mixedcor

                                            function will calculate the appropriate mixture of Pearson polychoric tetrachoric biserialand polyserial correlations

                                            The correlation matrix resulting from a number of tetrachoric or polychoric correlationmatrix sometimes will not be positive semi-definite This will sometimes happen if thecorrelation matrix is formed by using pair-wise deletion of cases The corsmooth functionwill adjust the smallest eigen values of the correlation matrix to make them positive rescaleall of them to sum to the number of variables and produce aldquosmoothedrdquocorrelation matrixAn example of this problem is a data set of burt which probably had a typo in the originalcorrelation matrix Smoothing the matrix corrects this problem

                                            4 Multilevel modeling

                                            Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular use

                                            34

                                            gt drawtetra()

                                            minus3 minus2 minus1 0 1 2 3

                                            minus3

                                            minus2

                                            minus1

                                            01

                                            23

                                            Y rho = 05phi = 033

                                            X gt τY gt Τ

                                            X lt τY gt Τ

                                            X gt τY lt Τ

                                            X lt τY lt Τ

                                            x

                                            dnor

                                            m(x

                                            )

                                            X gt τ

                                            τ

                                            x1

                                            Y gt Τ

                                            Τ

                                            Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values

                                            35

                                            gt drawcor(expand=20cuts=c(00))

                                            xy

                                            z

                                            Bivariate density rho = 05

                                            Figure 15 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values It isfound (laboriously) by optimizing the fit of the bivariate normal for various values of thecorrelation to the observed cell frequencies

                                            36

                                            is the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups

                                            41 Decomposing data into within and between level correlations usingstatsBy

                                            There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler functionto give some of the basic descriptive statistics for two level models

                                            This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package

                                            rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)

                                            where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means

                                            42 Generating and displaying multilevel data

                                            withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a betweengroup correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data

                                            simmultilevel will generate simulated data with a multilevel structure

                                            The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary

                                            function specifying the variable of interest

                                            37

                                            Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )

                                            43 Factor analysis by groups

                                            Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output

                                            sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)

                                            faBy(sbnfactors=5) find the 5 factor solution for each education level

                                            5 Multiple Regression mediation moderation and set cor-relations

                                            The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices

                                            51 Multiple regression from data or correlation matrices

                                            The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstonecorrelation matrix and find the multiple correlation of the last five variables as a functionof the first 4

                                            gt setCor(y = 59x=14data=Thurstone)

                                            Call setCor(y = 59 x = 14 data = Thurstone)

                                            Multiple Regression from matrix input

                                            Beta weights

                                            FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

                                            Sentences 009 007 025 021 020

                                            Vocabulary 009 017 009 016 -002

                                            SentCompletion 002 005 004 021 008

                                            FirstLetters 058 045 021 008 031

                                            38

                                            Multiple R

                                            FourLetterWords Suffixes LetterSeries Pedigrees

                                            069 063 050 058

                                            LetterGroup

                                            048

                                            multiple R2

                                            FourLetterWords Suffixes LetterSeries Pedigrees

                                            048 040 025 034

                                            LetterGroup

                                            023

                                            Multiple Inflation Factor (VIF) = 1(1-SMC) =

                                            Sentences Vocabulary SentCompletion FirstLetters

                                            369 388 300 135

                                            Unweighted multiple R

                                            FourLetterWords Suffixes LetterSeries Pedigrees

                                            059 058 049 058

                                            LetterGroup

                                            045

                                            Unweighted multiple R2

                                            FourLetterWords Suffixes LetterSeries Pedigrees

                                            034 034 024 033

                                            LetterGroup

                                            020

                                            Various estimates of between set correlations

                                            Squared Canonical Correlations

                                            [1] 06280 01478 00076 00049

                                            Average squared canonical correlation = 02

                                            Cohens Set Correlation R2 = 069

                                            Unweighted correlation between the two sets = 073

                                            By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed

                                            gt sc lt- setCor(y = 59x=34data=Thurstonez=12)

                                            Call setCor(y = 59 x = 34 data = Thurstone z = 12)

                                            Multiple Regression from matrix input

                                            Beta weights

                                            FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

                                            SentCompletion 002 005 004 021 008

                                            FirstLetters 058 045 021 008 031

                                            Multiple R

                                            FourLetterWords Suffixes LetterSeries Pedigrees

                                            058 046 021 018

                                            LetterGroup

                                            030

                                            39

                                            multiple R2

                                            FourLetterWords Suffixes LetterSeries Pedigrees

                                            0331 0210 0043 0032

                                            LetterGroup

                                            0092

                                            Multiple Inflation Factor (VIF) = 1(1-SMC) =

                                            SentCompletion FirstLetters

                                            102 102

                                            Unweighted multiple R

                                            FourLetterWords Suffixes LetterSeries Pedigrees

                                            044 035 017 014

                                            LetterGroup

                                            026

                                            Unweighted multiple R2

                                            FourLetterWords Suffixes LetterSeries Pedigrees

                                            019 012 003 002

                                            LetterGroup

                                            007

                                            Various estimates of between set correlations

                                            Squared Canonical Correlations

                                            [1] 0405 0023

                                            Average squared canonical correlation = 021

                                            Cohens Set Correlation R2 = 042

                                            Unweighted correlation between the two sets = 048

                                            gt round(sc$residual2)

                                            FourLetterWords Suffixes LetterSeries Pedigrees

                                            FourLetterWords 052 011 009 006

                                            Suffixes 011 060 -001 001

                                            LetterSeries 009 -001 075 028

                                            Pedigrees 006 001 028 066

                                            LetterGroup 013 003 037 020

                                            LetterGroup

                                            FourLetterWords 013

                                            Suffixes 003

                                            LetterSeries 037

                                            Pedigrees 020

                                            LetterGroup 077

                                            52 Mediation and Moderation analysis

                                            Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping

                                            40

                                            Consider the example from Preacher and Hayes (2004) as analyzed using the mediate

                                            function and the subsequent graphic from mediatediagram The data are found in theexample for mediate

                                            Call mediate(y = SATIS x = THERAPY m = ATTRIB data = sobel)

                                            The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB

                                            Total Direct effect(c) of THERAPY on SATIS = 076 SE = 031 t direct = 25 with probability = 0019

                                            Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t direct = 135 with probability = 019

                                            Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033

                                            Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 004 Upper CI = 069

                                            R2 of model = 031

                                            To see the longer output specify short = FALSE in the print statement

                                            Full output

                                            Total effect estimates (c)

                                            SATIS se t Prob

                                            THERAPY 076 031 25 00186

                                            Direct effect estimates (c)SATIS se t Prob

                                            THERAPY 043 032 135 0190

                                            ATTRIB 040 018 223 0034

                                            a effect estimates

                                            THERAPY se t Prob

                                            ATTRIB 082 03 274 00106

                                            b effect estimates

                                            SATIS se t Prob

                                            ATTRIB 04 018 223 0034

                                            ab effect estimates

                                            SATIS boot sd lower upper

                                            THERAPY 033 032 017 004 069

                                            bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables

                                            setCor(y = c( SATV SATQ) x = c(education age ) data = satact std=TRUE)

                                            bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap

                                            mediate(y = c( SATV ) x = c(education age ) m= ACT data =satactstd=TRUEniter=50)

                                            bull mediate will take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It then tests the mediation effect using a bootstrap The particular example is for demonstration purposes only and shows neithermoderation nor mediation The number of iterations for the boot strap was set to 50

                                            41

                                            gt mediatediagram(preacher)

                                            Mediation model

                                            THERAPY SATIS

                                            ATTRIB

                                            082

                                            c = 076

                                            c = 043

                                            04

                                            Figure 16 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram

                                            42

                                            gt preacher lt- setCor(1c(23)sobelstd=FALSE)

                                            gt setCordiagram(preacher)

                                            Regression Models

                                            THERAPY

                                            ATTRIB

                                            SATIS

                                            043

                                            04

                                            021

                                            Figure 17 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure

                                            43

                                            for speed The default number of boot straps is 5000

                                            53 Set Correlation

                                            An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor

                                            function Set correlation is

                                            R2 = 1minusn

                                            prodi=1

                                            (1minusλi)

                                            where λi is the ith eigen value of the eigen value decomposition of the matrix

                                            R = Rminus1xx RxyRminus1

                                            xx Rminus1xy

                                            Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate

                                            setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data

                                            Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things tonotice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied

                                            For this example the analysis is done on the correlation matrix rather than the rawdata

                                            gt C lt- cov(satactuse=pairwise)

                                            gt model1 lt- lm(ACT~ gender + education + age data=satact)

                                            gt summary(model1)

                                            Call

                                            lm(formula = ACT ~ gender + education + age data = satact)

                                            Residuals

                                            44

                                            Call mediate(y = c(SATQ) x = c(ACT) m = education data = satact

                                            mod = gender niter = 50 std = TRUE)

                                            The DV (Y) was SATQ The IV (X) was ACT gender ACTXgndr The mediating variable(s) = education

                                            Total Direct effect(c) of ACT on SATQ = 058 SE = 003 t direct = 1925 with probability = 0

                                            Direct effect (c) of ACT on SATQ removing education = 059 SE = 003 t direct = 1926 with probability = 0

                                            Indirect effect (ab) of ACT on SATQ through education = -001

                                            Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -002 Upper CI = 0

                                            Total Direct effect(c) of gender on SATQ = -014 SE = 003 t direct = -478 with probability = 21e-06

                                            Direct effect (c) of gender on NA removing education = -014 SE = 003 t direct = -463 with probability = 44e-06

                                            Indirect effect (ab) of gender on SATQ through education = 0

                                            Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -001 Upper CI = 0

                                            Total Direct effect(c) of ACTXgndr on SATQ = 0 SE = 003 t direct = 002 with probability = 099

                                            Direct effect (c) of ACTXgndr on NA removing education = 0 SE = 003 t direct = 001 with probability = 099

                                            Indirect effect (ab) of ACTXgndr on SATQ through education = 0

                                            Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = 0 Upper CI = 0

                                            R2 of model = 037

                                            To see the longer output specify short = FALSE in the print statement

                                            Full output

                                            Total effect estimates (c)

                                            SATQ se t Prob

                                            ACT 058 003 1925 000e+00

                                            gender -014 003 -478 210e-06

                                            ACTXgndr 000 003 002 985e-01

                                            Direct effect estimates (c)SATQ se t Prob

                                            ACT 059 003 1926 000e+00

                                            gender -014 003 -463 437e-06

                                            ACTXgndr 000 003 001 992e-01

                                            a effect estimates

                                            education se t Prob

                                            ACT 016 004 422 277e-05

                                            gender 009 004 250 128e-02

                                            ACTXgndr -001 004 -015 883e-01

                                            b effect estimates

                                            SATQ se t Prob

                                            education -004 003 -145 0147

                                            ab effect estimates

                                            SATQ boot sd lower upper

                                            ACT -001 -001 001 0 0

                                            gender 000 000 000 0 0

                                            ACTXgndr 000 000 000 0 0

                                            Moderation model

                                            ACT

                                            gender

                                            ACTXgndr

                                            SATQ

                                            education016 c = 058

                                            c = 059

                                            009 c = minus014

                                            c = minus014

                                            minus001 c = 0

                                            c = 0

                                            minus004

                                            minus004

                                            minus007

                                            002

                                            Figure 18 Moderated multiple regression requires the raw data

                                            45

                                            Min 1Q Median 3Q Max

                                            -252458 -32133 07769 35921 92630

                                            Coefficients

                                            Estimate Std Error t value Pr(gt|t|)

                                            (Intercept) 2741706 082140 33378 lt 2e-16

                                            gender -048606 037984 -1280 020110

                                            education 047890 015235 3143 000174

                                            age 001623 002278 0712 047650

                                            ---

                                            Signif codes 0 0001 001 005 01 1

                                            Residual standard error 4768 on 696 degrees of freedom

                                            Multiple R-squared 00272 Adjusted R-squared 002301

                                            F-statistic 6487 on 3 and 696 DF p-value 00002476

                                            Compare this with the output from setCor

                                            gt compare with sector

                                            gt setCor(c(46)c(13)C nobs=700)

                                            Call setCor(y = c(46) x = c(13) data = C nobs = 700)

                                            Multiple Regression from matrix input

                                            Beta weights

                                            ACT SATV SATQ

                                            gender -005 -003 -018

                                            education 014 010 010

                                            age 003 -010 -009

                                            Multiple R

                                            ACT SATV SATQ

                                            016 010 019

                                            multiple R2

                                            ACT SATV SATQ

                                            00272 00096 00359

                                            Multiple Inflation Factor (VIF) = 1(1-SMC) =

                                            gender education age

                                            101 145 144

                                            Unweighted multiple R

                                            ACT SATV SATQ

                                            015 005 011

                                            Unweighted multiple R2

                                            ACT SATV SATQ

                                            002 000 001

                                            SE of Beta weights

                                            ACT SATV SATQ

                                            gender 018 429 434

                                            education 022 513 518

                                            age 022 511 516

                                            t of Beta Weights

                                            ACT SATV SATQ

                                            gender -027 -001 -004

                                            education 065 002 002

                                            46

                                            age 015 -002 -002

                                            Probability of t lt

                                            ACT SATV SATQ

                                            gender 079 099 097

                                            education 051 098 098

                                            age 088 098 099

                                            Shrunken R2

                                            ACT SATV SATQ

                                            00230 00054 00317

                                            Standard Error of R2

                                            ACT SATV SATQ

                                            00120 00073 00137

                                            F

                                            ACT SATV SATQ

                                            649 226 863

                                            Probability of F lt

                                            ACT SATV SATQ

                                            248e-04 808e-02 124e-05

                                            degrees of freedom of regression

                                            [1] 3 696

                                            Various estimates of between set correlations

                                            Squared Canonical Correlations

                                            [1] 0050 0033 0008

                                            Chisq of canonical correlations

                                            [1] 358 231 56

                                            Average squared canonical correlation = 003

                                            Cohens Set Correlation R2 = 009

                                            Shrunken Set Correlation R2 = 008

                                            F and df of Cohens Set Correlation 726 9 168186

                                            Unweighted correlation between the two sets = 001

                                            Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

                                            6 Converting output to APA style tables using LATEX

                                            Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenient

                                            47

                                            LATEXoutput and finally df2latex converts a generic data frame to LATEX

                                            An example of converting the output from fa to LATEXappears in Table 2

                                            Table 2 fa2latexA factor analysis table from the psych package in R

                                            Variable MR1 MR2 MR3 h2 u2 com

                                            Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

                                            SS loadings 264 186 15

                                            MR1 100 059 054MR2 059 100 052MR3 054 052 100

                                            48

                                            7 Miscellaneous functions

                                            A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

                                            blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

                                            df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

                                            scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

                                            cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

                                            cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

                                            dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

                                            fisherz Convert a correlation to the corresponding Fisher z score

                                            geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

                                            ICC and cohenkappa are typically used to find the reliability for raters

                                            headtail combines the head and tail functions to show the first and last lines of a dataset or output

                                            topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

                                            mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

                                            prep finds the probability of replication for an F t or r and estimate effect size

                                            partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

                                            rangeCorrection will correct correlations for restriction of range

                                            reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

                                            49

                                            superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

                                            8 Data sets

                                            A number of data sets for demonstrating psychometric techniques are included in thepsych package These include six data sets showing a hierarchical factor structure (fivecognitive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2and one from health psychology Reise) One of these (Thurstone) is used as an examplein the sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) or 13 personality inventory scores (epibfi) and14 multiple choice iq items (iqitems) The vegetables example has paired comparisonpreferences for 9 vegetables This is an example of Thurstonian scaling used by Guilford(1954) and Nunnally (1967) Other data sets include cubits peas and heights fromGalton

                                            Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

                                            bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

                                            satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

                                            epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

                                            50

                                            iq 14 multiple choice ability items were included as part of the Synthetic Aperture Person-ality Assessment (SAPA) web based personality assessment project The data from1000 subjects are included here as a demonstration set for scoring multiple choiceinventories and doing basic item statistics

                                            galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to ntroduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

                                            Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

                                            miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

                                            9 Development version and a users guide

                                            The most recent development version is available as a source file at the repository main-tained at httppersonality-projectorgr That version will have removed the mostrecently discovered bugs (but perhaps introduced other yet to be discovered ones) Todownload that version go to the repository httppersonality-projectorgrsrc

                                            contrib and wander around For a Mac this version can be installed directly using theldquoother repositoryrdquo option in the package installer For a PC the zip file for the most recentrelease has been created using the win-builder facility at CRAN The development releasefor the Mac is usually several weeks ahead of the PC development version

                                            Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httppersonality-projectorgrpsych_manualpdf

                                            News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

                                            gt news(Version gt 170package=psych)

                                            51

                                            10 Psychometric Theory

                                            The psych package has been developed to help psychologists do basic research Many ofthe functions were developed to supplement a book (httppersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep)More information about the use of some of the functions may be found in the book

                                            For more extensive discussion of the use of psych in particular and R in general consulthttppersonality-projectorgrrguidehtml A short guide to R

                                            11 SessionInfo

                                            This document was prepared using the following settings

                                            gt sessionInfo()

                                            R Under development (unstable) (2017-03-05 r72309)

                                            Platform x86_64-apple-darwin1340 (64-bit)

                                            Running under macOS Sierra 10124

                                            Matrix products default

                                            BLAS LibraryFrameworksRframeworkVersions34ResourcesliblibRblas0dylib

                                            LAPACK LibraryFrameworksRframeworkVersions34ResourcesliblibRlapackdylib

                                            locale

                                            [1] C

                                            attached base packages

                                            [1] stats graphics grDevices utils datasets methods base

                                            other attached packages

                                            [1] psych_17421

                                            loaded via a namespace (and not attached)

                                            [1] compiler_340 parallel_340 tools_340 foreign_08-67

                                            [5] KernSmooth_223-15 nlme_31-131 mnormt_15-4 grid_340

                                            [9] lattice_020-34

                                            52

                                            References

                                            Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

                                            Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

                                            Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

                                            Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

                                            Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

                                            Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

                                            Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

                                            Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

                                            Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

                                            Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

                                            Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

                                            Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

                                            Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

                                            Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

                                            Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

                                            53

                                            Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

                                            Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

                                            Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

                                            Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

                                            Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

                                            Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

                                            Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

                                            Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

                                            Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

                                            Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

                                            MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

                                            Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

                                            McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

                                            Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

                                            Nunnally J C (1967) Psychometric theory McGraw-Hill New York

                                            54

                                            Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

                                            3rd edition

                                            Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

                                            Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

                                            Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

                                            Revelle W (2015) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 158

                                            Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

                                            Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

                                            Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

                                            Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

                                            Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook ofIndividual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

                                            Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

                                            Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

                                            Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

                                            Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bang

                                            55

                                            for the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

                                            Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

                                            Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

                                            Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

                                            Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

                                            Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

                                            Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

                                            Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

                                            Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

                                            Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

                                            Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

                                            Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

                                            56

                                            Index

                                            affect 14 24alpha 5 6

                                            Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26bifactor 6biserial 13 34blockrandom 49burt 34

                                            char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49circular statistics 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49ctv 7cubits 50

                                            densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49diagram 8drawcor 34drawtetra 34dummycode 13

                                            dynamite plot 19

                                            edit 3epibfi 50error bars 19errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23

                                            fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factor analysis 6factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49

                                            galton 51generalized least squares 6geometricmean 49GPArotation 7guttman 6

                                            harmonicmean 49head 49headtail 49heights 50hetdiagram 7Hmisc 28Holzinger 50

                                            57

                                            ICC 6 49iclust 6iclustdiagram 7Index 49introduction to psychometric theory with ap-

                                            plications in R 7iqitems 50irtfa 6 47irt2latex 47 49

                                            KnitR 47

                                            lavaan 38library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27lowess 14

                                            makekeys 14MAP 6mardia 49maximum likelihood 6mediate 4 41 42mediatediagram 41minimum residual 6mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevel 37multilevelreliability 6multiple regression 38

                                            nfactors 6nlme 37

                                            omega 6 7outlier 3 11 12

                                            padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7principal axis 6psych 3 5ndash8 28 47 49ndash52

                                            R functionaffect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50

                                            58

                                            densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13edit 3epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49head 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47

                                            irt2latex 47 49library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7psych 51psych package

                                            affect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26

                                            59

                                            biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49corsmooth 34cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7

                                            fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47irt2latex 47 49lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12prep 49pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34

                                            60

                                            polyserial 34principal 5ndash7psych 51rtest 28rangeCorrection 49readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

                                            rtest 28

                                            rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50table 19tail 49tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

                                            R package

                                            61

                                            ctv 7GPArotation 7Hmisc 28KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash8 28 47 49ndash52Rgraphviz 8sem 7 50stats 28Sweave 47xtable 47

                                            rtest 28rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49Rgraphviz 8

                                            SAPA 26 50 51satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43sem 7 50set correlation 44setcor 49setCor 4 38 41 44 46 47simmultilevel 37

                                            spider 13stars 14stats 28StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50Sweave 47

                                            table 19tail 49tetrachoric 6 34Thurstone 28 38 50Thurstone33 50topBottom 49

                                            vegetables 50 51violinBy 14 18vss 5 6

                                            weighted least squares 6withinBetween 37

                                            xtable 47

                                            62

                                            • Jump starting the psych packagendasha guide for the impatient
                                            • Psychometric functions are summarized in the second vignette
                                            • Overview of this and related documents
                                            • Getting started
                                            • Basic data analysis
                                              • Getting the data by using readfile
                                              • Data input from the clipboard
                                              • Basic descriptive statistics
                                                • Outlier detection using outlier
                                                • Basic data cleaning using scrub
                                                • Recoding categorical variables into dummy coded variables
                                                  • Simple descriptive graphics
                                                    • Scatter Plot Matrices
                                                    • Density or violin plots
                                                    • Means and error bars
                                                    • Error bars for tabular data
                                                    • Two dimensional displays of means and errors
                                                    • Back to back histograms
                                                    • Correlational structure
                                                    • Heatmap displays of correlational structure
                                                      • Testing correlations
                                                      • Polychoric tetrachoric polyserial and biserial correlations
                                                        • Multilevel modeling
                                                          • Decomposing data into within and between level correlations using statsBy
                                                          • Generating and displaying multilevel data
                                                          • Factor analysis by groups
                                                            • Multiple Regression mediation moderation and set correlations
                                                              • Multiple regression from data or correlation matrices
                                                              • Mediation and Moderation analysis
                                                              • Set Correlation
                                                                • Converting output to APA style tables using LaTeX
                                                                • Miscellaneous functions
                                                                • Data sets
                                                                • Development version and a users guide
                                                                • Psychometric Theory
                                                                • SessionInfo

                                              345 Two dimensional displays of means and errors

                                              Yet another way to display data for different conditions is to use the errorCrosses func-tion For instance the effect of various movies on both ldquoEnergetic Arousalrdquo and ldquoTenseArousalrdquo can be seen in one graph and compared to the same movie manipulations onldquoPositive Affectrdquo and ldquoNegative Affectrdquo Note how Energetic Arousal is increased by threeof the movie manipulations but that Positive Affect increases following the Happy movieonly

                                              23

                                              gt op lt- par(mfrow=c(12))

                                              gt data(affect)

                                              gt colors lt- c(blackredwhiteblue)

                                              gt films lt- c(SadHorrorNeutralHappy)

                                              gt affectstats lt- errorCircles(EA2TA2data=affect[-c(120)]group=Filmlabels=films

                                              + xlab=Energetic Arousal ylab=Tense Arousalylim=c(1022)xlim=c(820)pch=16

                                              + cex=2colors=colors main = Movies effect on arousal)gt errorCircles(PA2NA2data=affectstatslabels=filmsxlab=Positive Affect

                                              + ylab=Negative Affect pch=16cex=2colors=colors main =Movies effect on affect)

                                              gt op lt- par(mfrow=c(11))

                                              8 12 16 20

                                              1012

                                              1416

                                              1820

                                              22

                                              Movies effect on arousal

                                              Energetic Arousal

                                              Tens

                                              e A

                                              rous

                                              al

                                              SadHorror

                                              NeutralHappy

                                              6 8 10 12

                                              24

                                              68

                                              10

                                              Movies effect on affect

                                              Positive Affect

                                              Neg

                                              ativ

                                              e A

                                              ffect

                                              Sad

                                              Horror

                                              NeutralHappy

                                              Figure 9 The use of the errorCircles function allows for two dimensional displays ofmeans and error bars The first call to errorCircles finds descriptive statistics for theaffect dataframe based upon the grouping variable of Film These data are returned andthen used by the second call which examines the effect of the same grouping variable upondifferent measures The size of the circles represent the relative sample sizes for each groupThe data are from the PMC lab and reported in Smillie et al (2012)

                                              24

                                              346 Back to back histograms

                                              The bibars function summarize the characteristics of two groups (eg males and females)on a second variable (eg age) by drawing back to back histograms (see Figure 10)

                                              25

                                              data(bfi)gt png( bibarspng )

                                              gt with(bfibibars(agegenderylab=Agemain=Age by males and females))

                                              gt devoff()

                                              null device

                                              1

                                              Figure 10 A bar plot of the age distribution for males and females shows the use ofbibars The data are males and females from 2800 cases collected using the SAPAprocedure and are available as part of the bfi data set

                                              26

                                              347 Correlational structure

                                              There are many ways to display correlations Tabular displays are probably the mostcommon The output from the cor function in core R is a rectangular matrix lowerMat

                                              will round this to (2) digits and then display as a lower off diagonal matrix lowerCor

                                              calls cor with use=lsquopairwisersquo method=lsquopearsonrsquo as default values and returns (invisibly)the full correlation matrix and displays the lower off diagonal matrix

                                              gt lowerCor(satact)

                                              gendr edctn age ACT SATV SATQ

                                              gender 100

                                              education 009 100

                                              age -002 055 100

                                              ACT -004 015 011 100

                                              SATV -002 005 -004 056 100

                                              SATQ -017 003 -003 059 064 100

                                              When comparing results from two different groups it is convenient to display them as onematrix with the results from one group below the diagonal and the other group above thediagonal Use lowerUpper to do this

                                              gt female lt- subset(satactsatact$gender==2)

                                              gt male lt- subset(satactsatact$gender==1)

                                              gt lower lt- lowerCor(male[-1])

                                              edctn age ACT SATV SATQ

                                              education 100

                                              age 061 100

                                              ACT 016 015 100

                                              SATV 002 -006 061 100

                                              SATQ 008 004 060 068 100

                                              gt upper lt- lowerCor(female[-1])

                                              edctn age ACT SATV SATQ

                                              education 100

                                              age 052 100

                                              ACT 016 008 100

                                              SATV 007 -003 053 100

                                              SATQ 003 -009 058 063 100

                                              gt both lt- lowerUpper(lowerupper)

                                              gt round(both2)

                                              education age ACT SATV SATQ

                                              education NA 052 016 007 003

                                              age 061 NA 008 -003 -009

                                              ACT 016 015 NA 053 058

                                              SATV 002 -006 061 NA 063

                                              SATQ 008 004 060 068 NA

                                              It is also possible to compare two matrices by taking their differences and displaying one (be-low the diagonal) and the difference of the second from the first above the diagonal

                                              27

                                              gt diffs lt- lowerUpper(lowerupperdiff=TRUE)

                                              gt round(diffs2)

                                              education age ACT SATV SATQ

                                              education NA 009 000 -005 005

                                              age 061 NA 007 -003 013

                                              ACT 016 015 NA 008 002

                                              SATV 002 -006 061 NA 005

                                              SATQ 008 004 060 068 NA

                                              348 Heatmap displays of correlational structure

                                              Perhaps a better way to see the structure in a correlation matrix is to display a heat mapof the correlations This is just a matrix color coded to represent the magnitude of thecorrelation This is useful when considering the number of factors in a data set Considerthe Thurstone data set which has a clear 3 factor solution (Figure 11) or a simulated dataset of 24 variables with a circumplex structure (Figure 12) The color coding representsa ldquoheat maprdquo of the correlations with darker shades of red representing stronger negativeand darker shades of blue stronger positive correlations As an option the value of thecorrelation can be shown

                                              Yet another way to show structure is to use ldquospiderrdquo plots Particularly if variables areordered in some meaningful way (eg in a circumplex) a spider plot will show this structureeasily This is just a plot of the magnitude of the correlation as a radial line with lengthranging from 0 (for a correlation of -1) to 1 (for a correlation of 1) (See Figure 13)

                                              35 Testing correlations

                                              Correlations are wonderful descriptive statistics of the data but some people like to testwhether these correlations differ from zero or differ from each other The cortest func-tion (in the stats package) will test the significance of a single correlation and the rcorr

                                              function in the Hmisc package will do this for many correlations In the psych packagethe corrtest function reports the correlation (Pearson Spearman or Kendall) betweenall variables in either one or two data frames or matrices as well as the number of obser-vations for each case and the (two-tailed) probability for each correlation Unfortunatelythese probability values have not been corrected for multiple comparisons and so shouldbe taken with a great deal of salt Thus in corrtest and corrp the raw probabilitiesare reported below the diagonal and the probabilities adjusted for multiple comparisonsusing (by default) the Holm correction are reported above the diagonal (Table 1) (See thepadjust function for a discussion of Holm (1979) and other corrections)

                                              Testing the difference between any two correlations can be done using the rtest functionThe function actually does four different tests (based upon an article by Steiger (1980)

                                              28

                                              gt png(corplotpng)gt corPlot(Thurstonenumbers=TRUEupper=FALSEdiag=FALSEmain=9 cognitive variables from Thurstone)

                                              gt devoff()

                                              null device

                                              1

                                              Figure 11 The structure of correlation matrix can be seen more clearly if the variables aregrouped by factor and then the correlations are shown by color By using the rsquonumbersrsquooption the values are displayed as well By default the complete matrix is shown Settingupper=FALSE and diag=FALSE shows a cleaner figure

                                              29

                                              gt png(circplotpng)gt circ lt- simcirc(24)

                                              gt rcirc lt- cor(circ)

                                              gt corPlot(rcircmain=24 variables in a circumplex)gt devoff()

                                              null device

                                              1

                                              Figure 12 Using the corPlot function to show the correlations in a circumplex Correlationsare highest near the diagonal diminish to zero further from the diagonal and the increaseagain towards the corners of the matrix Circumplex structures are common in the studyof affect For circumplex structures it is perhaps useful to show the complete matrix

                                              30

                                              gt png(spiderpng)gt oplt- par(mfrow=c(22))

                                              gt spider(y=c(161218)x=124data=rcircfill=TRUEmain=Spider plot of 24 circumplex variables)

                                              gt op lt- par(mfrow=c(11))

                                              gt devoff()

                                              null device

                                              1

                                              Figure 13 A spider plot can show circumplex structure very clearly Circumplex structuresare common in the study of affect

                                              31

                                              Table 1 The corrtest function reports correlations cell sizes and raw and adjustedprobability values corrp reports the probability values for a correlation matrix Bydefault the adjustment used is that of Holm (1979)gt corrtest(satact)

                                              Callcorrtest(x = satact)

                                              Correlation matrix

                                              gender education age ACT SATV SATQ

                                              gender 100 009 -002 -004 -002 -017

                                              education 009 100 055 015 005 003

                                              age -002 055 100 011 -004 -003

                                              ACT -004 015 011 100 056 059

                                              SATV -002 005 -004 056 100 064

                                              SATQ -017 003 -003 059 064 100

                                              Sample Size

                                              gender education age ACT SATV SATQ

                                              gender 700 700 700 700 700 687

                                              education 700 700 700 700 700 687

                                              age 700 700 700 700 700 687

                                              ACT 700 700 700 700 700 687

                                              SATV 700 700 700 700 700 687

                                              SATQ 687 687 687 687 687 687

                                              Probability values (Entries above the diagonal are adjusted for multiple tests)

                                              gender education age ACT SATV SATQ

                                              gender 000 017 100 100 1 0

                                              education 002 000 000 000 1 1

                                              age 058 000 000 003 1 1

                                              ACT 033 000 000 000 0 0

                                              SATV 062 022 026 000 0 0

                                              SATQ 000 036 037 000 0 0

                                              To see confidence intervals of the correlations print with the short=FALSE option

                                              32

                                              depending upon the input

                                              1) For a sample size n find the t and p value for a single correlation as well as the confidenceinterval

                                              gt rtest(503)

                                              Correlation tests

                                              Callrtest(n = 50 r12 = 03)

                                              Test of significance of a correlation

                                              t value 218 with probability lt 0034

                                              and confidence interval 002 053

                                              2) For sample sizes of n and n2 (n2 = n if not specified) find the z of the difference betweenthe z transformed correlations divided by the standard error of the difference of two zscores

                                              gt rtest(3046)

                                              Correlation tests

                                              Callrtest(n = 30 r12 = 04 r34 = 06)

                                              Test of difference between two independent correlations

                                              z value 099 with probability 032

                                              3) For sample size n and correlations ra= r12 rb= r23 and r13 specified test for thedifference of two dependent correlations (Steiger case A)

                                              gt rtest(103451)

                                              Correlation tests

                                              Call[1] rtest(n = 103 r12 = 04 r23 = 01 r13 = 05 )

                                              Test of difference between two correlated correlations

                                              t value -089 with probability lt 037

                                              4) For sample size n test for the difference between two dependent correlations involvingdifferent variables (Steiger case B)

                                              gt rtest(103567558) steiger Case B

                                              Correlation tests

                                              Callrtest(n = 103 r12 = 05 r34 = 06 r23 = 07 r13 = 05 r14 = 05

                                              r24 = 08)

                                              Test of difference between two dependent correlations

                                              z value -12 with probability 023

                                              To test whether a matrix of correlations differs from what would be expected if the popu-lation correlations were all zero the function cortest follows Steiger (1980) who pointedout that the sum of the squared elements of a correlation matrix or the Fisher z scoreequivalents is distributed as chi square under the null hypothesis that the values are zero(ie elements of the identity matrix) This is particularly useful for examining whethercorrelations in a single matrix differ from zero or for comparing two matrices Althoughobvious cortest can be used to test whether the satact data matrix produces non-zerocorrelations (it does) This is a much more appropriate test when testing whether a residualmatrix differs from zero

                                              gt cortest(satact)

                                              33

                                              Tests of correlation matrices

                                              Callcortest(R1 = satact)

                                              Chi Square value 132542 with df = 15 with probability lt 18e-273

                                              36 Polychoric tetrachoric polyserial and biserial correlations

                                              The Pearson correlation of dichotomous data is also known as the φ coefficient If thedata eg ability items are thought to represent an underlying continuous although latentvariable the φ will underestimate the value of the Pearson applied to these latent variablesOne solution to this problem is to use the tetrachoric correlation which is based uponthe assumption of a bivariate normal distribution that has been cut at certain points Thedrawtetra function demonstrates the process (Figure 14) This is also shown in termsof dichotomizing the bivariate normal density function using the drawcor function (Fig-ure 15) A simple generalization of this to the case of the multiple cuts is the polychoric

                                              correlation

                                              Other estimated correlations based upon the assumption of bivariate normality with cutpoints include the biserial and polyserial correlation

                                              If the data are a mix of continuous polytomous and dichotomous variables the mixedcor

                                              function will calculate the appropriate mixture of Pearson polychoric tetrachoric biserialand polyserial correlations

                                              The correlation matrix resulting from a number of tetrachoric or polychoric correlationmatrix sometimes will not be positive semi-definite This will sometimes happen if thecorrelation matrix is formed by using pair-wise deletion of cases The corsmooth functionwill adjust the smallest eigen values of the correlation matrix to make them positive rescaleall of them to sum to the number of variables and produce aldquosmoothedrdquocorrelation matrixAn example of this problem is a data set of burt which probably had a typo in the originalcorrelation matrix Smoothing the matrix corrects this problem

                                              4 Multilevel modeling

                                              Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular use

                                              34

                                              gt drawtetra()

                                              minus3 minus2 minus1 0 1 2 3

                                              minus3

                                              minus2

                                              minus1

                                              01

                                              23

                                              Y rho = 05phi = 033

                                              X gt τY gt Τ

                                              X lt τY gt Τ

                                              X gt τY lt Τ

                                              X lt τY lt Τ

                                              x

                                              dnor

                                              m(x

                                              )

                                              X gt τ

                                              τ

                                              x1

                                              Y gt Τ

                                              Τ

                                              Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values

                                              35

                                              gt drawcor(expand=20cuts=c(00))

                                              xy

                                              z

                                              Bivariate density rho = 05

                                              Figure 15 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values It isfound (laboriously) by optimizing the fit of the bivariate normal for various values of thecorrelation to the observed cell frequencies

                                              36

                                              is the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups

                                              41 Decomposing data into within and between level correlations usingstatsBy

                                              There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler functionto give some of the basic descriptive statistics for two level models

                                              This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package

                                              rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)

                                              where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means

                                              42 Generating and displaying multilevel data

                                              withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a betweengroup correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data

                                              simmultilevel will generate simulated data with a multilevel structure

                                              The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary

                                              function specifying the variable of interest

                                              37

                                              Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )

                                              43 Factor analysis by groups

                                              Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output

                                              sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)

                                              faBy(sbnfactors=5) find the 5 factor solution for each education level

                                              5 Multiple Regression mediation moderation and set cor-relations

                                              The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices

                                              51 Multiple regression from data or correlation matrices

                                              The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstonecorrelation matrix and find the multiple correlation of the last five variables as a functionof the first 4

                                              gt setCor(y = 59x=14data=Thurstone)

                                              Call setCor(y = 59 x = 14 data = Thurstone)

                                              Multiple Regression from matrix input

                                              Beta weights

                                              FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

                                              Sentences 009 007 025 021 020

                                              Vocabulary 009 017 009 016 -002

                                              SentCompletion 002 005 004 021 008

                                              FirstLetters 058 045 021 008 031

                                              38

                                              Multiple R

                                              FourLetterWords Suffixes LetterSeries Pedigrees

                                              069 063 050 058

                                              LetterGroup

                                              048

                                              multiple R2

                                              FourLetterWords Suffixes LetterSeries Pedigrees

                                              048 040 025 034

                                              LetterGroup

                                              023

                                              Multiple Inflation Factor (VIF) = 1(1-SMC) =

                                              Sentences Vocabulary SentCompletion FirstLetters

                                              369 388 300 135

                                              Unweighted multiple R

                                              FourLetterWords Suffixes LetterSeries Pedigrees

                                              059 058 049 058

                                              LetterGroup

                                              045

                                              Unweighted multiple R2

                                              FourLetterWords Suffixes LetterSeries Pedigrees

                                              034 034 024 033

                                              LetterGroup

                                              020

                                              Various estimates of between set correlations

                                              Squared Canonical Correlations

                                              [1] 06280 01478 00076 00049

                                              Average squared canonical correlation = 02

                                              Cohens Set Correlation R2 = 069

                                              Unweighted correlation between the two sets = 073

                                              By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed

                                              gt sc lt- setCor(y = 59x=34data=Thurstonez=12)

                                              Call setCor(y = 59 x = 34 data = Thurstone z = 12)

                                              Multiple Regression from matrix input

                                              Beta weights

                                              FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

                                              SentCompletion 002 005 004 021 008

                                              FirstLetters 058 045 021 008 031

                                              Multiple R

                                              FourLetterWords Suffixes LetterSeries Pedigrees

                                              058 046 021 018

                                              LetterGroup

                                              030

                                              39

                                              multiple R2

                                              FourLetterWords Suffixes LetterSeries Pedigrees

                                              0331 0210 0043 0032

                                              LetterGroup

                                              0092

                                              Multiple Inflation Factor (VIF) = 1(1-SMC) =

                                              SentCompletion FirstLetters

                                              102 102

                                              Unweighted multiple R

                                              FourLetterWords Suffixes LetterSeries Pedigrees

                                              044 035 017 014

                                              LetterGroup

                                              026

                                              Unweighted multiple R2

                                              FourLetterWords Suffixes LetterSeries Pedigrees

                                              019 012 003 002

                                              LetterGroup

                                              007

                                              Various estimates of between set correlations

                                              Squared Canonical Correlations

                                              [1] 0405 0023

                                              Average squared canonical correlation = 021

                                              Cohens Set Correlation R2 = 042

                                              Unweighted correlation between the two sets = 048

                                              gt round(sc$residual2)

                                              FourLetterWords Suffixes LetterSeries Pedigrees

                                              FourLetterWords 052 011 009 006

                                              Suffixes 011 060 -001 001

                                              LetterSeries 009 -001 075 028

                                              Pedigrees 006 001 028 066

                                              LetterGroup 013 003 037 020

                                              LetterGroup

                                              FourLetterWords 013

                                              Suffixes 003

                                              LetterSeries 037

                                              Pedigrees 020

                                              LetterGroup 077

                                              52 Mediation and Moderation analysis

                                              Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping

                                              40

                                              Consider the example from Preacher and Hayes (2004) as analyzed using the mediate

                                              function and the subsequent graphic from mediatediagram The data are found in theexample for mediate

                                              Call mediate(y = SATIS x = THERAPY m = ATTRIB data = sobel)

                                              The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB

                                              Total Direct effect(c) of THERAPY on SATIS = 076 SE = 031 t direct = 25 with probability = 0019

                                              Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t direct = 135 with probability = 019

                                              Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033

                                              Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 004 Upper CI = 069

                                              R2 of model = 031

                                              To see the longer output specify short = FALSE in the print statement

                                              Full output

                                              Total effect estimates (c)

                                              SATIS se t Prob

                                              THERAPY 076 031 25 00186

                                              Direct effect estimates (c)SATIS se t Prob

                                              THERAPY 043 032 135 0190

                                              ATTRIB 040 018 223 0034

                                              a effect estimates

                                              THERAPY se t Prob

                                              ATTRIB 082 03 274 00106

                                              b effect estimates

                                              SATIS se t Prob

                                              ATTRIB 04 018 223 0034

                                              ab effect estimates

                                              SATIS boot sd lower upper

                                              THERAPY 033 032 017 004 069

                                              bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables

                                              setCor(y = c( SATV SATQ) x = c(education age ) data = satact std=TRUE)

                                              bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap

                                              mediate(y = c( SATV ) x = c(education age ) m= ACT data =satactstd=TRUEniter=50)

                                              bull mediate will take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It then tests the mediation effect using a bootstrap The particular example is for demonstration purposes only and shows neithermoderation nor mediation The number of iterations for the boot strap was set to 50

                                              41

                                              gt mediatediagram(preacher)

                                              Mediation model

                                              THERAPY SATIS

                                              ATTRIB

                                              082

                                              c = 076

                                              c = 043

                                              04

                                              Figure 16 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram

                                              42

                                              gt preacher lt- setCor(1c(23)sobelstd=FALSE)

                                              gt setCordiagram(preacher)

                                              Regression Models

                                              THERAPY

                                              ATTRIB

                                              SATIS

                                              043

                                              04

                                              021

                                              Figure 17 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure

                                              43

                                              for speed The default number of boot straps is 5000

                                              53 Set Correlation

                                              An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor

                                              function Set correlation is

                                              R2 = 1minusn

                                              prodi=1

                                              (1minusλi)

                                              where λi is the ith eigen value of the eigen value decomposition of the matrix

                                              R = Rminus1xx RxyRminus1

                                              xx Rminus1xy

                                              Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate

                                              setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data

                                              Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things tonotice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied

                                              For this example the analysis is done on the correlation matrix rather than the rawdata

                                              gt C lt- cov(satactuse=pairwise)

                                              gt model1 lt- lm(ACT~ gender + education + age data=satact)

                                              gt summary(model1)

                                              Call

                                              lm(formula = ACT ~ gender + education + age data = satact)

                                              Residuals

                                              44

                                              Call mediate(y = c(SATQ) x = c(ACT) m = education data = satact

                                              mod = gender niter = 50 std = TRUE)

                                              The DV (Y) was SATQ The IV (X) was ACT gender ACTXgndr The mediating variable(s) = education

                                              Total Direct effect(c) of ACT on SATQ = 058 SE = 003 t direct = 1925 with probability = 0

                                              Direct effect (c) of ACT on SATQ removing education = 059 SE = 003 t direct = 1926 with probability = 0

                                              Indirect effect (ab) of ACT on SATQ through education = -001

                                              Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -002 Upper CI = 0

                                              Total Direct effect(c) of gender on SATQ = -014 SE = 003 t direct = -478 with probability = 21e-06

                                              Direct effect (c) of gender on NA removing education = -014 SE = 003 t direct = -463 with probability = 44e-06

                                              Indirect effect (ab) of gender on SATQ through education = 0

                                              Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -001 Upper CI = 0

                                              Total Direct effect(c) of ACTXgndr on SATQ = 0 SE = 003 t direct = 002 with probability = 099

                                              Direct effect (c) of ACTXgndr on NA removing education = 0 SE = 003 t direct = 001 with probability = 099

                                              Indirect effect (ab) of ACTXgndr on SATQ through education = 0

                                              Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = 0 Upper CI = 0

                                              R2 of model = 037

                                              To see the longer output specify short = FALSE in the print statement

                                              Full output

                                              Total effect estimates (c)

                                              SATQ se t Prob

                                              ACT 058 003 1925 000e+00

                                              gender -014 003 -478 210e-06

                                              ACTXgndr 000 003 002 985e-01

                                              Direct effect estimates (c)SATQ se t Prob

                                              ACT 059 003 1926 000e+00

                                              gender -014 003 -463 437e-06

                                              ACTXgndr 000 003 001 992e-01

                                              a effect estimates

                                              education se t Prob

                                              ACT 016 004 422 277e-05

                                              gender 009 004 250 128e-02

                                              ACTXgndr -001 004 -015 883e-01

                                              b effect estimates

                                              SATQ se t Prob

                                              education -004 003 -145 0147

                                              ab effect estimates

                                              SATQ boot sd lower upper

                                              ACT -001 -001 001 0 0

                                              gender 000 000 000 0 0

                                              ACTXgndr 000 000 000 0 0

                                              Moderation model

                                              ACT

                                              gender

                                              ACTXgndr

                                              SATQ

                                              education016 c = 058

                                              c = 059

                                              009 c = minus014

                                              c = minus014

                                              minus001 c = 0

                                              c = 0

                                              minus004

                                              minus004

                                              minus007

                                              002

                                              Figure 18 Moderated multiple regression requires the raw data

                                              45

                                              Min 1Q Median 3Q Max

                                              -252458 -32133 07769 35921 92630

                                              Coefficients

                                              Estimate Std Error t value Pr(gt|t|)

                                              (Intercept) 2741706 082140 33378 lt 2e-16

                                              gender -048606 037984 -1280 020110

                                              education 047890 015235 3143 000174

                                              age 001623 002278 0712 047650

                                              ---

                                              Signif codes 0 0001 001 005 01 1

                                              Residual standard error 4768 on 696 degrees of freedom

                                              Multiple R-squared 00272 Adjusted R-squared 002301

                                              F-statistic 6487 on 3 and 696 DF p-value 00002476

                                              Compare this with the output from setCor

                                              gt compare with sector

                                              gt setCor(c(46)c(13)C nobs=700)

                                              Call setCor(y = c(46) x = c(13) data = C nobs = 700)

                                              Multiple Regression from matrix input

                                              Beta weights

                                              ACT SATV SATQ

                                              gender -005 -003 -018

                                              education 014 010 010

                                              age 003 -010 -009

                                              Multiple R

                                              ACT SATV SATQ

                                              016 010 019

                                              multiple R2

                                              ACT SATV SATQ

                                              00272 00096 00359

                                              Multiple Inflation Factor (VIF) = 1(1-SMC) =

                                              gender education age

                                              101 145 144

                                              Unweighted multiple R

                                              ACT SATV SATQ

                                              015 005 011

                                              Unweighted multiple R2

                                              ACT SATV SATQ

                                              002 000 001

                                              SE of Beta weights

                                              ACT SATV SATQ

                                              gender 018 429 434

                                              education 022 513 518

                                              age 022 511 516

                                              t of Beta Weights

                                              ACT SATV SATQ

                                              gender -027 -001 -004

                                              education 065 002 002

                                              46

                                              age 015 -002 -002

                                              Probability of t lt

                                              ACT SATV SATQ

                                              gender 079 099 097

                                              education 051 098 098

                                              age 088 098 099

                                              Shrunken R2

                                              ACT SATV SATQ

                                              00230 00054 00317

                                              Standard Error of R2

                                              ACT SATV SATQ

                                              00120 00073 00137

                                              F

                                              ACT SATV SATQ

                                              649 226 863

                                              Probability of F lt

                                              ACT SATV SATQ

                                              248e-04 808e-02 124e-05

                                              degrees of freedom of regression

                                              [1] 3 696

                                              Various estimates of between set correlations

                                              Squared Canonical Correlations

                                              [1] 0050 0033 0008

                                              Chisq of canonical correlations

                                              [1] 358 231 56

                                              Average squared canonical correlation = 003

                                              Cohens Set Correlation R2 = 009

                                              Shrunken Set Correlation R2 = 008

                                              F and df of Cohens Set Correlation 726 9 168186

                                              Unweighted correlation between the two sets = 001

                                              Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

                                              6 Converting output to APA style tables using LATEX

                                              Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenient

                                              47

                                              LATEXoutput and finally df2latex converts a generic data frame to LATEX

                                              An example of converting the output from fa to LATEXappears in Table 2

                                              Table 2 fa2latexA factor analysis table from the psych package in R

                                              Variable MR1 MR2 MR3 h2 u2 com

                                              Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

                                              SS loadings 264 186 15

                                              MR1 100 059 054MR2 059 100 052MR3 054 052 100

                                              48

                                              7 Miscellaneous functions

                                              A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

                                              blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

                                              df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

                                              scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

                                              cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

                                              cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

                                              dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

                                              fisherz Convert a correlation to the corresponding Fisher z score

                                              geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

                                              ICC and cohenkappa are typically used to find the reliability for raters

                                              headtail combines the head and tail functions to show the first and last lines of a dataset or output

                                              topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

                                              mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

                                              prep finds the probability of replication for an F t or r and estimate effect size

                                              partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

                                              rangeCorrection will correct correlations for restriction of range

                                              reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

                                              49

                                              superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

                                              8 Data sets

                                              A number of data sets for demonstrating psychometric techniques are included in thepsych package These include six data sets showing a hierarchical factor structure (fivecognitive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2and one from health psychology Reise) One of these (Thurstone) is used as an examplein the sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) or 13 personality inventory scores (epibfi) and14 multiple choice iq items (iqitems) The vegetables example has paired comparisonpreferences for 9 vegetables This is an example of Thurstonian scaling used by Guilford(1954) and Nunnally (1967) Other data sets include cubits peas and heights fromGalton

                                              Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

                                              bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

                                              satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

                                              epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

                                              50

                                              iq 14 multiple choice ability items were included as part of the Synthetic Aperture Person-ality Assessment (SAPA) web based personality assessment project The data from1000 subjects are included here as a demonstration set for scoring multiple choiceinventories and doing basic item statistics

                                              galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to ntroduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

                                              Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

                                              miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

                                              9 Development version and a users guide

                                              The most recent development version is available as a source file at the repository main-tained at httppersonality-projectorgr That version will have removed the mostrecently discovered bugs (but perhaps introduced other yet to be discovered ones) Todownload that version go to the repository httppersonality-projectorgrsrc

                                              contrib and wander around For a Mac this version can be installed directly using theldquoother repositoryrdquo option in the package installer For a PC the zip file for the most recentrelease has been created using the win-builder facility at CRAN The development releasefor the Mac is usually several weeks ahead of the PC development version

                                              Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httppersonality-projectorgrpsych_manualpdf

                                              News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

                                              gt news(Version gt 170package=psych)

                                              51

                                              10 Psychometric Theory

                                              The psych package has been developed to help psychologists do basic research Many ofthe functions were developed to supplement a book (httppersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep)More information about the use of some of the functions may be found in the book

                                              For more extensive discussion of the use of psych in particular and R in general consulthttppersonality-projectorgrrguidehtml A short guide to R

                                              11 SessionInfo

                                              This document was prepared using the following settings

                                              gt sessionInfo()

                                              R Under development (unstable) (2017-03-05 r72309)

                                              Platform x86_64-apple-darwin1340 (64-bit)

                                              Running under macOS Sierra 10124

                                              Matrix products default

                                              BLAS LibraryFrameworksRframeworkVersions34ResourcesliblibRblas0dylib

                                              LAPACK LibraryFrameworksRframeworkVersions34ResourcesliblibRlapackdylib

                                              locale

                                              [1] C

                                              attached base packages

                                              [1] stats graphics grDevices utils datasets methods base

                                              other attached packages

                                              [1] psych_17421

                                              loaded via a namespace (and not attached)

                                              [1] compiler_340 parallel_340 tools_340 foreign_08-67

                                              [5] KernSmooth_223-15 nlme_31-131 mnormt_15-4 grid_340

                                              [9] lattice_020-34

                                              52

                                              References

                                              Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

                                              Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

                                              Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

                                              Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

                                              Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

                                              Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

                                              Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

                                              Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

                                              Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

                                              Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

                                              Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

                                              Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

                                              Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

                                              Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

                                              Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

                                              53

                                              Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

                                              Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

                                              Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

                                              Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

                                              Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

                                              Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

                                              Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

                                              Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

                                              Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

                                              Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

                                              MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

                                              Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

                                              McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

                                              Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

                                              Nunnally J C (1967) Psychometric theory McGraw-Hill New York

                                              54

                                              Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

                                              3rd edition

                                              Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

                                              Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

                                              Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

                                              Revelle W (2015) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 158

                                              Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

                                              Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

                                              Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

                                              Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

                                              Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook ofIndividual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

                                              Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

                                              Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

                                              Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

                                              Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bang

                                              55

                                              for the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

                                              Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

                                              Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

                                              Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

                                              Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

                                              Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

                                              Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

                                              Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

                                              Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

                                              Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

                                              Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

                                              Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

                                              56

                                              Index

                                              affect 14 24alpha 5 6

                                              Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26bifactor 6biserial 13 34blockrandom 49burt 34

                                              char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49circular statistics 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49ctv 7cubits 50

                                              densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49diagram 8drawcor 34drawtetra 34dummycode 13

                                              dynamite plot 19

                                              edit 3epibfi 50error bars 19errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23

                                              fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factor analysis 6factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49

                                              galton 51generalized least squares 6geometricmean 49GPArotation 7guttman 6

                                              harmonicmean 49head 49headtail 49heights 50hetdiagram 7Hmisc 28Holzinger 50

                                              57

                                              ICC 6 49iclust 6iclustdiagram 7Index 49introduction to psychometric theory with ap-

                                              plications in R 7iqitems 50irtfa 6 47irt2latex 47 49

                                              KnitR 47

                                              lavaan 38library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27lowess 14

                                              makekeys 14MAP 6mardia 49maximum likelihood 6mediate 4 41 42mediatediagram 41minimum residual 6mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevel 37multilevelreliability 6multiple regression 38

                                              nfactors 6nlme 37

                                              omega 6 7outlier 3 11 12

                                              padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7principal axis 6psych 3 5ndash8 28 47 49ndash52

                                              R functionaffect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50

                                              58

                                              densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13edit 3epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49head 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47

                                              irt2latex 47 49library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7psych 51psych package

                                              affect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26

                                              59

                                              biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49corsmooth 34cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7

                                              fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47irt2latex 47 49lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12prep 49pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34

                                              60

                                              polyserial 34principal 5ndash7psych 51rtest 28rangeCorrection 49readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

                                              rtest 28

                                              rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50table 19tail 49tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

                                              R package

                                              61

                                              ctv 7GPArotation 7Hmisc 28KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash8 28 47 49ndash52Rgraphviz 8sem 7 50stats 28Sweave 47xtable 47

                                              rtest 28rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49Rgraphviz 8

                                              SAPA 26 50 51satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43sem 7 50set correlation 44setcor 49setCor 4 38 41 44 46 47simmultilevel 37

                                              spider 13stars 14stats 28StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50Sweave 47

                                              table 19tail 49tetrachoric 6 34Thurstone 28 38 50Thurstone33 50topBottom 49

                                              vegetables 50 51violinBy 14 18vss 5 6

                                              weighted least squares 6withinBetween 37

                                              xtable 47

                                              62

                                              • Jump starting the psych packagendasha guide for the impatient
                                              • Psychometric functions are summarized in the second vignette
                                              • Overview of this and related documents
                                              • Getting started
                                              • Basic data analysis
                                                • Getting the data by using readfile
                                                • Data input from the clipboard
                                                • Basic descriptive statistics
                                                  • Outlier detection using outlier
                                                  • Basic data cleaning using scrub
                                                  • Recoding categorical variables into dummy coded variables
                                                    • Simple descriptive graphics
                                                      • Scatter Plot Matrices
                                                      • Density or violin plots
                                                      • Means and error bars
                                                      • Error bars for tabular data
                                                      • Two dimensional displays of means and errors
                                                      • Back to back histograms
                                                      • Correlational structure
                                                      • Heatmap displays of correlational structure
                                                        • Testing correlations
                                                        • Polychoric tetrachoric polyserial and biserial correlations
                                                          • Multilevel modeling
                                                            • Decomposing data into within and between level correlations using statsBy
                                                            • Generating and displaying multilevel data
                                                            • Factor analysis by groups
                                                              • Multiple Regression mediation moderation and set correlations
                                                                • Multiple regression from data or correlation matrices
                                                                • Mediation and Moderation analysis
                                                                • Set Correlation
                                                                  • Converting output to APA style tables using LaTeX
                                                                  • Miscellaneous functions
                                                                  • Data sets
                                                                  • Development version and a users guide
                                                                  • Psychometric Theory
                                                                  • SessionInfo

                                                gt op lt- par(mfrow=c(12))

                                                gt data(affect)

                                                gt colors lt- c(blackredwhiteblue)

                                                gt films lt- c(SadHorrorNeutralHappy)

                                                gt affectstats lt- errorCircles(EA2TA2data=affect[-c(120)]group=Filmlabels=films

                                                + xlab=Energetic Arousal ylab=Tense Arousalylim=c(1022)xlim=c(820)pch=16

                                                + cex=2colors=colors main = Movies effect on arousal)gt errorCircles(PA2NA2data=affectstatslabels=filmsxlab=Positive Affect

                                                + ylab=Negative Affect pch=16cex=2colors=colors main =Movies effect on affect)

                                                gt op lt- par(mfrow=c(11))

                                                8 12 16 20

                                                1012

                                                1416

                                                1820

                                                22

                                                Movies effect on arousal

                                                Energetic Arousal

                                                Tens

                                                e A

                                                rous

                                                al

                                                SadHorror

                                                NeutralHappy

                                                6 8 10 12

                                                24

                                                68

                                                10

                                                Movies effect on affect

                                                Positive Affect

                                                Neg

                                                ativ

                                                e A

                                                ffect

                                                Sad

                                                Horror

                                                NeutralHappy

                                                Figure 9 The use of the errorCircles function allows for two dimensional displays ofmeans and error bars The first call to errorCircles finds descriptive statistics for theaffect dataframe based upon the grouping variable of Film These data are returned andthen used by the second call which examines the effect of the same grouping variable upondifferent measures The size of the circles represent the relative sample sizes for each groupThe data are from the PMC lab and reported in Smillie et al (2012)

                                                24

                                                346 Back to back histograms

                                                The bibars function summarize the characteristics of two groups (eg males and females)on a second variable (eg age) by drawing back to back histograms (see Figure 10)

                                                25

                                                data(bfi)gt png( bibarspng )

                                                gt with(bfibibars(agegenderylab=Agemain=Age by males and females))

                                                gt devoff()

                                                null device

                                                1

                                                Figure 10 A bar plot of the age distribution for males and females shows the use ofbibars The data are males and females from 2800 cases collected using the SAPAprocedure and are available as part of the bfi data set

                                                26

                                                347 Correlational structure

                                                There are many ways to display correlations Tabular displays are probably the mostcommon The output from the cor function in core R is a rectangular matrix lowerMat

                                                will round this to (2) digits and then display as a lower off diagonal matrix lowerCor

                                                calls cor with use=lsquopairwisersquo method=lsquopearsonrsquo as default values and returns (invisibly)the full correlation matrix and displays the lower off diagonal matrix

                                                gt lowerCor(satact)

                                                gendr edctn age ACT SATV SATQ

                                                gender 100

                                                education 009 100

                                                age -002 055 100

                                                ACT -004 015 011 100

                                                SATV -002 005 -004 056 100

                                                SATQ -017 003 -003 059 064 100

                                                When comparing results from two different groups it is convenient to display them as onematrix with the results from one group below the diagonal and the other group above thediagonal Use lowerUpper to do this

                                                gt female lt- subset(satactsatact$gender==2)

                                                gt male lt- subset(satactsatact$gender==1)

                                                gt lower lt- lowerCor(male[-1])

                                                edctn age ACT SATV SATQ

                                                education 100

                                                age 061 100

                                                ACT 016 015 100

                                                SATV 002 -006 061 100

                                                SATQ 008 004 060 068 100

                                                gt upper lt- lowerCor(female[-1])

                                                edctn age ACT SATV SATQ

                                                education 100

                                                age 052 100

                                                ACT 016 008 100

                                                SATV 007 -003 053 100

                                                SATQ 003 -009 058 063 100

                                                gt both lt- lowerUpper(lowerupper)

                                                gt round(both2)

                                                education age ACT SATV SATQ

                                                education NA 052 016 007 003

                                                age 061 NA 008 -003 -009

                                                ACT 016 015 NA 053 058

                                                SATV 002 -006 061 NA 063

                                                SATQ 008 004 060 068 NA

                                                It is also possible to compare two matrices by taking their differences and displaying one (be-low the diagonal) and the difference of the second from the first above the diagonal

                                                27

                                                gt diffs lt- lowerUpper(lowerupperdiff=TRUE)

                                                gt round(diffs2)

                                                education age ACT SATV SATQ

                                                education NA 009 000 -005 005

                                                age 061 NA 007 -003 013

                                                ACT 016 015 NA 008 002

                                                SATV 002 -006 061 NA 005

                                                SATQ 008 004 060 068 NA

                                                348 Heatmap displays of correlational structure

                                                Perhaps a better way to see the structure in a correlation matrix is to display a heat mapof the correlations This is just a matrix color coded to represent the magnitude of thecorrelation This is useful when considering the number of factors in a data set Considerthe Thurstone data set which has a clear 3 factor solution (Figure 11) or a simulated dataset of 24 variables with a circumplex structure (Figure 12) The color coding representsa ldquoheat maprdquo of the correlations with darker shades of red representing stronger negativeand darker shades of blue stronger positive correlations As an option the value of thecorrelation can be shown

                                                Yet another way to show structure is to use ldquospiderrdquo plots Particularly if variables areordered in some meaningful way (eg in a circumplex) a spider plot will show this structureeasily This is just a plot of the magnitude of the correlation as a radial line with lengthranging from 0 (for a correlation of -1) to 1 (for a correlation of 1) (See Figure 13)

                                                35 Testing correlations

                                                Correlations are wonderful descriptive statistics of the data but some people like to testwhether these correlations differ from zero or differ from each other The cortest func-tion (in the stats package) will test the significance of a single correlation and the rcorr

                                                function in the Hmisc package will do this for many correlations In the psych packagethe corrtest function reports the correlation (Pearson Spearman or Kendall) betweenall variables in either one or two data frames or matrices as well as the number of obser-vations for each case and the (two-tailed) probability for each correlation Unfortunatelythese probability values have not been corrected for multiple comparisons and so shouldbe taken with a great deal of salt Thus in corrtest and corrp the raw probabilitiesare reported below the diagonal and the probabilities adjusted for multiple comparisonsusing (by default) the Holm correction are reported above the diagonal (Table 1) (See thepadjust function for a discussion of Holm (1979) and other corrections)

                                                Testing the difference between any two correlations can be done using the rtest functionThe function actually does four different tests (based upon an article by Steiger (1980)

                                                28

                                                gt png(corplotpng)gt corPlot(Thurstonenumbers=TRUEupper=FALSEdiag=FALSEmain=9 cognitive variables from Thurstone)

                                                gt devoff()

                                                null device

                                                1

                                                Figure 11 The structure of correlation matrix can be seen more clearly if the variables aregrouped by factor and then the correlations are shown by color By using the rsquonumbersrsquooption the values are displayed as well By default the complete matrix is shown Settingupper=FALSE and diag=FALSE shows a cleaner figure

                                                29

                                                gt png(circplotpng)gt circ lt- simcirc(24)

                                                gt rcirc lt- cor(circ)

                                                gt corPlot(rcircmain=24 variables in a circumplex)gt devoff()

                                                null device

                                                1

                                                Figure 12 Using the corPlot function to show the correlations in a circumplex Correlationsare highest near the diagonal diminish to zero further from the diagonal and the increaseagain towards the corners of the matrix Circumplex structures are common in the studyof affect For circumplex structures it is perhaps useful to show the complete matrix

                                                30

                                                gt png(spiderpng)gt oplt- par(mfrow=c(22))

                                                gt spider(y=c(161218)x=124data=rcircfill=TRUEmain=Spider plot of 24 circumplex variables)

                                                gt op lt- par(mfrow=c(11))

                                                gt devoff()

                                                null device

                                                1

                                                Figure 13 A spider plot can show circumplex structure very clearly Circumplex structuresare common in the study of affect

                                                31

                                                Table 1 The corrtest function reports correlations cell sizes and raw and adjustedprobability values corrp reports the probability values for a correlation matrix Bydefault the adjustment used is that of Holm (1979)gt corrtest(satact)

                                                Callcorrtest(x = satact)

                                                Correlation matrix

                                                gender education age ACT SATV SATQ

                                                gender 100 009 -002 -004 -002 -017

                                                education 009 100 055 015 005 003

                                                age -002 055 100 011 -004 -003

                                                ACT -004 015 011 100 056 059

                                                SATV -002 005 -004 056 100 064

                                                SATQ -017 003 -003 059 064 100

                                                Sample Size

                                                gender education age ACT SATV SATQ

                                                gender 700 700 700 700 700 687

                                                education 700 700 700 700 700 687

                                                age 700 700 700 700 700 687

                                                ACT 700 700 700 700 700 687

                                                SATV 700 700 700 700 700 687

                                                SATQ 687 687 687 687 687 687

                                                Probability values (Entries above the diagonal are adjusted for multiple tests)

                                                gender education age ACT SATV SATQ

                                                gender 000 017 100 100 1 0

                                                education 002 000 000 000 1 1

                                                age 058 000 000 003 1 1

                                                ACT 033 000 000 000 0 0

                                                SATV 062 022 026 000 0 0

                                                SATQ 000 036 037 000 0 0

                                                To see confidence intervals of the correlations print with the short=FALSE option

                                                32

                                                depending upon the input

                                                1) For a sample size n find the t and p value for a single correlation as well as the confidenceinterval

                                                gt rtest(503)

                                                Correlation tests

                                                Callrtest(n = 50 r12 = 03)

                                                Test of significance of a correlation

                                                t value 218 with probability lt 0034

                                                and confidence interval 002 053

                                                2) For sample sizes of n and n2 (n2 = n if not specified) find the z of the difference betweenthe z transformed correlations divided by the standard error of the difference of two zscores

                                                gt rtest(3046)

                                                Correlation tests

                                                Callrtest(n = 30 r12 = 04 r34 = 06)

                                                Test of difference between two independent correlations

                                                z value 099 with probability 032

                                                3) For sample size n and correlations ra= r12 rb= r23 and r13 specified test for thedifference of two dependent correlations (Steiger case A)

                                                gt rtest(103451)

                                                Correlation tests

                                                Call[1] rtest(n = 103 r12 = 04 r23 = 01 r13 = 05 )

                                                Test of difference between two correlated correlations

                                                t value -089 with probability lt 037

                                                4) For sample size n test for the difference between two dependent correlations involvingdifferent variables (Steiger case B)

                                                gt rtest(103567558) steiger Case B

                                                Correlation tests

                                                Callrtest(n = 103 r12 = 05 r34 = 06 r23 = 07 r13 = 05 r14 = 05

                                                r24 = 08)

                                                Test of difference between two dependent correlations

                                                z value -12 with probability 023

                                                To test whether a matrix of correlations differs from what would be expected if the popu-lation correlations were all zero the function cortest follows Steiger (1980) who pointedout that the sum of the squared elements of a correlation matrix or the Fisher z scoreequivalents is distributed as chi square under the null hypothesis that the values are zero(ie elements of the identity matrix) This is particularly useful for examining whethercorrelations in a single matrix differ from zero or for comparing two matrices Althoughobvious cortest can be used to test whether the satact data matrix produces non-zerocorrelations (it does) This is a much more appropriate test when testing whether a residualmatrix differs from zero

                                                gt cortest(satact)

                                                33

                                                Tests of correlation matrices

                                                Callcortest(R1 = satact)

                                                Chi Square value 132542 with df = 15 with probability lt 18e-273

                                                36 Polychoric tetrachoric polyserial and biserial correlations

                                                The Pearson correlation of dichotomous data is also known as the φ coefficient If thedata eg ability items are thought to represent an underlying continuous although latentvariable the φ will underestimate the value of the Pearson applied to these latent variablesOne solution to this problem is to use the tetrachoric correlation which is based uponthe assumption of a bivariate normal distribution that has been cut at certain points Thedrawtetra function demonstrates the process (Figure 14) This is also shown in termsof dichotomizing the bivariate normal density function using the drawcor function (Fig-ure 15) A simple generalization of this to the case of the multiple cuts is the polychoric

                                                correlation

                                                Other estimated correlations based upon the assumption of bivariate normality with cutpoints include the biserial and polyserial correlation

                                                If the data are a mix of continuous polytomous and dichotomous variables the mixedcor

                                                function will calculate the appropriate mixture of Pearson polychoric tetrachoric biserialand polyserial correlations

                                                The correlation matrix resulting from a number of tetrachoric or polychoric correlationmatrix sometimes will not be positive semi-definite This will sometimes happen if thecorrelation matrix is formed by using pair-wise deletion of cases The corsmooth functionwill adjust the smallest eigen values of the correlation matrix to make them positive rescaleall of them to sum to the number of variables and produce aldquosmoothedrdquocorrelation matrixAn example of this problem is a data set of burt which probably had a typo in the originalcorrelation matrix Smoothing the matrix corrects this problem

                                                4 Multilevel modeling

                                                Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular use

                                                34

                                                gt drawtetra()

                                                minus3 minus2 minus1 0 1 2 3

                                                minus3

                                                minus2

                                                minus1

                                                01

                                                23

                                                Y rho = 05phi = 033

                                                X gt τY gt Τ

                                                X lt τY gt Τ

                                                X gt τY lt Τ

                                                X lt τY lt Τ

                                                x

                                                dnor

                                                m(x

                                                )

                                                X gt τ

                                                τ

                                                x1

                                                Y gt Τ

                                                Τ

                                                Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values

                                                35

                                                gt drawcor(expand=20cuts=c(00))

                                                xy

                                                z

                                                Bivariate density rho = 05

                                                Figure 15 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values It isfound (laboriously) by optimizing the fit of the bivariate normal for various values of thecorrelation to the observed cell frequencies

                                                36

                                                is the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups

                                                41 Decomposing data into within and between level correlations usingstatsBy

                                                There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler functionto give some of the basic descriptive statistics for two level models

                                                This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package

                                                rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)

                                                where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means

                                                42 Generating and displaying multilevel data

                                                withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a betweengroup correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data

                                                simmultilevel will generate simulated data with a multilevel structure

                                                The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary

                                                function specifying the variable of interest

                                                37

                                                Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )

                                                43 Factor analysis by groups

                                                Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output

                                                sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)

                                                faBy(sbnfactors=5) find the 5 factor solution for each education level

                                                5 Multiple Regression mediation moderation and set cor-relations

                                                The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices

                                                51 Multiple regression from data or correlation matrices

                                                The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstonecorrelation matrix and find the multiple correlation of the last five variables as a functionof the first 4

                                                gt setCor(y = 59x=14data=Thurstone)

                                                Call setCor(y = 59 x = 14 data = Thurstone)

                                                Multiple Regression from matrix input

                                                Beta weights

                                                FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

                                                Sentences 009 007 025 021 020

                                                Vocabulary 009 017 009 016 -002

                                                SentCompletion 002 005 004 021 008

                                                FirstLetters 058 045 021 008 031

                                                38

                                                Multiple R

                                                FourLetterWords Suffixes LetterSeries Pedigrees

                                                069 063 050 058

                                                LetterGroup

                                                048

                                                multiple R2

                                                FourLetterWords Suffixes LetterSeries Pedigrees

                                                048 040 025 034

                                                LetterGroup

                                                023

                                                Multiple Inflation Factor (VIF) = 1(1-SMC) =

                                                Sentences Vocabulary SentCompletion FirstLetters

                                                369 388 300 135

                                                Unweighted multiple R

                                                FourLetterWords Suffixes LetterSeries Pedigrees

                                                059 058 049 058

                                                LetterGroup

                                                045

                                                Unweighted multiple R2

                                                FourLetterWords Suffixes LetterSeries Pedigrees

                                                034 034 024 033

                                                LetterGroup

                                                020

                                                Various estimates of between set correlations

                                                Squared Canonical Correlations

                                                [1] 06280 01478 00076 00049

                                                Average squared canonical correlation = 02

                                                Cohens Set Correlation R2 = 069

                                                Unweighted correlation between the two sets = 073

                                                By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed

                                                gt sc lt- setCor(y = 59x=34data=Thurstonez=12)

                                                Call setCor(y = 59 x = 34 data = Thurstone z = 12)

                                                Multiple Regression from matrix input

                                                Beta weights

                                                FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

                                                SentCompletion 002 005 004 021 008

                                                FirstLetters 058 045 021 008 031

                                                Multiple R

                                                FourLetterWords Suffixes LetterSeries Pedigrees

                                                058 046 021 018

                                                LetterGroup

                                                030

                                                39

                                                multiple R2

                                                FourLetterWords Suffixes LetterSeries Pedigrees

                                                0331 0210 0043 0032

                                                LetterGroup

                                                0092

                                                Multiple Inflation Factor (VIF) = 1(1-SMC) =

                                                SentCompletion FirstLetters

                                                102 102

                                                Unweighted multiple R

                                                FourLetterWords Suffixes LetterSeries Pedigrees

                                                044 035 017 014

                                                LetterGroup

                                                026

                                                Unweighted multiple R2

                                                FourLetterWords Suffixes LetterSeries Pedigrees

                                                019 012 003 002

                                                LetterGroup

                                                007

                                                Various estimates of between set correlations

                                                Squared Canonical Correlations

                                                [1] 0405 0023

                                                Average squared canonical correlation = 021

                                                Cohens Set Correlation R2 = 042

                                                Unweighted correlation between the two sets = 048

                                                gt round(sc$residual2)

                                                FourLetterWords Suffixes LetterSeries Pedigrees

                                                FourLetterWords 052 011 009 006

                                                Suffixes 011 060 -001 001

                                                LetterSeries 009 -001 075 028

                                                Pedigrees 006 001 028 066

                                                LetterGroup 013 003 037 020

                                                LetterGroup

                                                FourLetterWords 013

                                                Suffixes 003

                                                LetterSeries 037

                                                Pedigrees 020

                                                LetterGroup 077

                                                52 Mediation and Moderation analysis

                                                Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping

                                                40

                                                Consider the example from Preacher and Hayes (2004) as analyzed using the mediate

                                                function and the subsequent graphic from mediatediagram The data are found in theexample for mediate

                                                Call mediate(y = SATIS x = THERAPY m = ATTRIB data = sobel)

                                                The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB

                                                Total Direct effect(c) of THERAPY on SATIS = 076 SE = 031 t direct = 25 with probability = 0019

                                                Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t direct = 135 with probability = 019

                                                Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033

                                                Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 004 Upper CI = 069

                                                R2 of model = 031

                                                To see the longer output specify short = FALSE in the print statement

                                                Full output

                                                Total effect estimates (c)

                                                SATIS se t Prob

                                                THERAPY 076 031 25 00186

                                                Direct effect estimates (c)SATIS se t Prob

                                                THERAPY 043 032 135 0190

                                                ATTRIB 040 018 223 0034

                                                a effect estimates

                                                THERAPY se t Prob

                                                ATTRIB 082 03 274 00106

                                                b effect estimates

                                                SATIS se t Prob

                                                ATTRIB 04 018 223 0034

                                                ab effect estimates

                                                SATIS boot sd lower upper

                                                THERAPY 033 032 017 004 069

                                                bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables

                                                setCor(y = c( SATV SATQ) x = c(education age ) data = satact std=TRUE)

                                                bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap

                                                mediate(y = c( SATV ) x = c(education age ) m= ACT data =satactstd=TRUEniter=50)

                                                bull mediate will take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It then tests the mediation effect using a bootstrap The particular example is for demonstration purposes only and shows neithermoderation nor mediation The number of iterations for the boot strap was set to 50

                                                41

                                                gt mediatediagram(preacher)

                                                Mediation model

                                                THERAPY SATIS

                                                ATTRIB

                                                082

                                                c = 076

                                                c = 043

                                                04

                                                Figure 16 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram

                                                42

                                                gt preacher lt- setCor(1c(23)sobelstd=FALSE)

                                                gt setCordiagram(preacher)

                                                Regression Models

                                                THERAPY

                                                ATTRIB

                                                SATIS

                                                043

                                                04

                                                021

                                                Figure 17 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure

                                                43

                                                for speed The default number of boot straps is 5000

                                                53 Set Correlation

                                                An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor

                                                function Set correlation is

                                                R2 = 1minusn

                                                prodi=1

                                                (1minusλi)

                                                where λi is the ith eigen value of the eigen value decomposition of the matrix

                                                R = Rminus1xx RxyRminus1

                                                xx Rminus1xy

                                                Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate

                                                setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data

                                                Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things tonotice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied

                                                For this example the analysis is done on the correlation matrix rather than the rawdata

                                                gt C lt- cov(satactuse=pairwise)

                                                gt model1 lt- lm(ACT~ gender + education + age data=satact)

                                                gt summary(model1)

                                                Call

                                                lm(formula = ACT ~ gender + education + age data = satact)

                                                Residuals

                                                44

                                                Call mediate(y = c(SATQ) x = c(ACT) m = education data = satact

                                                mod = gender niter = 50 std = TRUE)

                                                The DV (Y) was SATQ The IV (X) was ACT gender ACTXgndr The mediating variable(s) = education

                                                Total Direct effect(c) of ACT on SATQ = 058 SE = 003 t direct = 1925 with probability = 0

                                                Direct effect (c) of ACT on SATQ removing education = 059 SE = 003 t direct = 1926 with probability = 0

                                                Indirect effect (ab) of ACT on SATQ through education = -001

                                                Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -002 Upper CI = 0

                                                Total Direct effect(c) of gender on SATQ = -014 SE = 003 t direct = -478 with probability = 21e-06

                                                Direct effect (c) of gender on NA removing education = -014 SE = 003 t direct = -463 with probability = 44e-06

                                                Indirect effect (ab) of gender on SATQ through education = 0

                                                Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -001 Upper CI = 0

                                                Total Direct effect(c) of ACTXgndr on SATQ = 0 SE = 003 t direct = 002 with probability = 099

                                                Direct effect (c) of ACTXgndr on NA removing education = 0 SE = 003 t direct = 001 with probability = 099

                                                Indirect effect (ab) of ACTXgndr on SATQ through education = 0

                                                Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = 0 Upper CI = 0

                                                R2 of model = 037

                                                To see the longer output specify short = FALSE in the print statement

                                                Full output

                                                Total effect estimates (c)

                                                SATQ se t Prob

                                                ACT 058 003 1925 000e+00

                                                gender -014 003 -478 210e-06

                                                ACTXgndr 000 003 002 985e-01

                                                Direct effect estimates (c)SATQ se t Prob

                                                ACT 059 003 1926 000e+00

                                                gender -014 003 -463 437e-06

                                                ACTXgndr 000 003 001 992e-01

                                                a effect estimates

                                                education se t Prob

                                                ACT 016 004 422 277e-05

                                                gender 009 004 250 128e-02

                                                ACTXgndr -001 004 -015 883e-01

                                                b effect estimates

                                                SATQ se t Prob

                                                education -004 003 -145 0147

                                                ab effect estimates

                                                SATQ boot sd lower upper

                                                ACT -001 -001 001 0 0

                                                gender 000 000 000 0 0

                                                ACTXgndr 000 000 000 0 0

                                                Moderation model

                                                ACT

                                                gender

                                                ACTXgndr

                                                SATQ

                                                education016 c = 058

                                                c = 059

                                                009 c = minus014

                                                c = minus014

                                                minus001 c = 0

                                                c = 0

                                                minus004

                                                minus004

                                                minus007

                                                002

                                                Figure 18 Moderated multiple regression requires the raw data

                                                45

                                                Min 1Q Median 3Q Max

                                                -252458 -32133 07769 35921 92630

                                                Coefficients

                                                Estimate Std Error t value Pr(gt|t|)

                                                (Intercept) 2741706 082140 33378 lt 2e-16

                                                gender -048606 037984 -1280 020110

                                                education 047890 015235 3143 000174

                                                age 001623 002278 0712 047650

                                                ---

                                                Signif codes 0 0001 001 005 01 1

                                                Residual standard error 4768 on 696 degrees of freedom

                                                Multiple R-squared 00272 Adjusted R-squared 002301

                                                F-statistic 6487 on 3 and 696 DF p-value 00002476

                                                Compare this with the output from setCor

                                                gt compare with sector

                                                gt setCor(c(46)c(13)C nobs=700)

                                                Call setCor(y = c(46) x = c(13) data = C nobs = 700)

                                                Multiple Regression from matrix input

                                                Beta weights

                                                ACT SATV SATQ

                                                gender -005 -003 -018

                                                education 014 010 010

                                                age 003 -010 -009

                                                Multiple R

                                                ACT SATV SATQ

                                                016 010 019

                                                multiple R2

                                                ACT SATV SATQ

                                                00272 00096 00359

                                                Multiple Inflation Factor (VIF) = 1(1-SMC) =

                                                gender education age

                                                101 145 144

                                                Unweighted multiple R

                                                ACT SATV SATQ

                                                015 005 011

                                                Unweighted multiple R2

                                                ACT SATV SATQ

                                                002 000 001

                                                SE of Beta weights

                                                ACT SATV SATQ

                                                gender 018 429 434

                                                education 022 513 518

                                                age 022 511 516

                                                t of Beta Weights

                                                ACT SATV SATQ

                                                gender -027 -001 -004

                                                education 065 002 002

                                                46

                                                age 015 -002 -002

                                                Probability of t lt

                                                ACT SATV SATQ

                                                gender 079 099 097

                                                education 051 098 098

                                                age 088 098 099

                                                Shrunken R2

                                                ACT SATV SATQ

                                                00230 00054 00317

                                                Standard Error of R2

                                                ACT SATV SATQ

                                                00120 00073 00137

                                                F

                                                ACT SATV SATQ

                                                649 226 863

                                                Probability of F lt

                                                ACT SATV SATQ

                                                248e-04 808e-02 124e-05

                                                degrees of freedom of regression

                                                [1] 3 696

                                                Various estimates of between set correlations

                                                Squared Canonical Correlations

                                                [1] 0050 0033 0008

                                                Chisq of canonical correlations

                                                [1] 358 231 56

                                                Average squared canonical correlation = 003

                                                Cohens Set Correlation R2 = 009

                                                Shrunken Set Correlation R2 = 008

                                                F and df of Cohens Set Correlation 726 9 168186

                                                Unweighted correlation between the two sets = 001

                                                Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

                                                6 Converting output to APA style tables using LATEX

                                                Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenient

                                                47

                                                LATEXoutput and finally df2latex converts a generic data frame to LATEX

                                                An example of converting the output from fa to LATEXappears in Table 2

                                                Table 2 fa2latexA factor analysis table from the psych package in R

                                                Variable MR1 MR2 MR3 h2 u2 com

                                                Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

                                                SS loadings 264 186 15

                                                MR1 100 059 054MR2 059 100 052MR3 054 052 100

                                                48

                                                7 Miscellaneous functions

                                                A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

                                                blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

                                                df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

                                                scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

                                                cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

                                                cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

                                                dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

                                                fisherz Convert a correlation to the corresponding Fisher z score

                                                geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

                                                ICC and cohenkappa are typically used to find the reliability for raters

                                                headtail combines the head and tail functions to show the first and last lines of a dataset or output

                                                topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

                                                mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

                                                prep finds the probability of replication for an F t or r and estimate effect size

                                                partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

                                                rangeCorrection will correct correlations for restriction of range

                                                reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

                                                49

                                                superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

                                                8 Data sets

                                                A number of data sets for demonstrating psychometric techniques are included in thepsych package These include six data sets showing a hierarchical factor structure (fivecognitive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2and one from health psychology Reise) One of these (Thurstone) is used as an examplein the sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) or 13 personality inventory scores (epibfi) and14 multiple choice iq items (iqitems) The vegetables example has paired comparisonpreferences for 9 vegetables This is an example of Thurstonian scaling used by Guilford(1954) and Nunnally (1967) Other data sets include cubits peas and heights fromGalton

                                                Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

                                                bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

                                                satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

                                                epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

                                                50

                                                iq 14 multiple choice ability items were included as part of the Synthetic Aperture Person-ality Assessment (SAPA) web based personality assessment project The data from1000 subjects are included here as a demonstration set for scoring multiple choiceinventories and doing basic item statistics

                                                galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to ntroduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

                                                Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

                                                miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

                                                9 Development version and a users guide

                                                The most recent development version is available as a source file at the repository main-tained at httppersonality-projectorgr That version will have removed the mostrecently discovered bugs (but perhaps introduced other yet to be discovered ones) Todownload that version go to the repository httppersonality-projectorgrsrc

                                                contrib and wander around For a Mac this version can be installed directly using theldquoother repositoryrdquo option in the package installer For a PC the zip file for the most recentrelease has been created using the win-builder facility at CRAN The development releasefor the Mac is usually several weeks ahead of the PC development version

                                                Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httppersonality-projectorgrpsych_manualpdf

                                                News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

                                                gt news(Version gt 170package=psych)

                                                51

                                                10 Psychometric Theory

                                                The psych package has been developed to help psychologists do basic research Many ofthe functions were developed to supplement a book (httppersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep)More information about the use of some of the functions may be found in the book

                                                For more extensive discussion of the use of psych in particular and R in general consulthttppersonality-projectorgrrguidehtml A short guide to R

                                                11 SessionInfo

                                                This document was prepared using the following settings

                                                gt sessionInfo()

                                                R Under development (unstable) (2017-03-05 r72309)

                                                Platform x86_64-apple-darwin1340 (64-bit)

                                                Running under macOS Sierra 10124

                                                Matrix products default

                                                BLAS LibraryFrameworksRframeworkVersions34ResourcesliblibRblas0dylib

                                                LAPACK LibraryFrameworksRframeworkVersions34ResourcesliblibRlapackdylib

                                                locale

                                                [1] C

                                                attached base packages

                                                [1] stats graphics grDevices utils datasets methods base

                                                other attached packages

                                                [1] psych_17421

                                                loaded via a namespace (and not attached)

                                                [1] compiler_340 parallel_340 tools_340 foreign_08-67

                                                [5] KernSmooth_223-15 nlme_31-131 mnormt_15-4 grid_340

                                                [9] lattice_020-34

                                                52

                                                References

                                                Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

                                                Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

                                                Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

                                                Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

                                                Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

                                                Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

                                                Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

                                                Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

                                                Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

                                                Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

                                                Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

                                                Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

                                                Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

                                                Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

                                                Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

                                                53

                                                Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

                                                Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

                                                Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

                                                Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

                                                Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

                                                Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

                                                Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

                                                Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

                                                Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

                                                Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

                                                MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

                                                Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

                                                McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

                                                Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

                                                Nunnally J C (1967) Psychometric theory McGraw-Hill New York

                                                54

                                                Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

                                                3rd edition

                                                Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

                                                Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

                                                Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

                                                Revelle W (2015) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 158

                                                Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

                                                Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

                                                Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

                                                Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

                                                Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook ofIndividual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

                                                Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

                                                Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

                                                Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

                                                Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bang

                                                55

                                                for the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

                                                Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

                                                Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

                                                Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

                                                Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

                                                Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

                                                Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

                                                Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

                                                Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

                                                Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

                                                Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

                                                Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

                                                56

                                                Index

                                                affect 14 24alpha 5 6

                                                Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26bifactor 6biserial 13 34blockrandom 49burt 34

                                                char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49circular statistics 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49ctv 7cubits 50

                                                densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49diagram 8drawcor 34drawtetra 34dummycode 13

                                                dynamite plot 19

                                                edit 3epibfi 50error bars 19errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23

                                                fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factor analysis 6factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49

                                                galton 51generalized least squares 6geometricmean 49GPArotation 7guttman 6

                                                harmonicmean 49head 49headtail 49heights 50hetdiagram 7Hmisc 28Holzinger 50

                                                57

                                                ICC 6 49iclust 6iclustdiagram 7Index 49introduction to psychometric theory with ap-

                                                plications in R 7iqitems 50irtfa 6 47irt2latex 47 49

                                                KnitR 47

                                                lavaan 38library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27lowess 14

                                                makekeys 14MAP 6mardia 49maximum likelihood 6mediate 4 41 42mediatediagram 41minimum residual 6mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevel 37multilevelreliability 6multiple regression 38

                                                nfactors 6nlme 37

                                                omega 6 7outlier 3 11 12

                                                padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7principal axis 6psych 3 5ndash8 28 47 49ndash52

                                                R functionaffect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50

                                                58

                                                densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13edit 3epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49head 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47

                                                irt2latex 47 49library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7psych 51psych package

                                                affect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26

                                                59

                                                biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49corsmooth 34cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7

                                                fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47irt2latex 47 49lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12prep 49pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34

                                                60

                                                polyserial 34principal 5ndash7psych 51rtest 28rangeCorrection 49readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

                                                rtest 28

                                                rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50table 19tail 49tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

                                                R package

                                                61

                                                ctv 7GPArotation 7Hmisc 28KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash8 28 47 49ndash52Rgraphviz 8sem 7 50stats 28Sweave 47xtable 47

                                                rtest 28rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49Rgraphviz 8

                                                SAPA 26 50 51satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43sem 7 50set correlation 44setcor 49setCor 4 38 41 44 46 47simmultilevel 37

                                                spider 13stars 14stats 28StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50Sweave 47

                                                table 19tail 49tetrachoric 6 34Thurstone 28 38 50Thurstone33 50topBottom 49

                                                vegetables 50 51violinBy 14 18vss 5 6

                                                weighted least squares 6withinBetween 37

                                                xtable 47

                                                62

                                                • Jump starting the psych packagendasha guide for the impatient
                                                • Psychometric functions are summarized in the second vignette
                                                • Overview of this and related documents
                                                • Getting started
                                                • Basic data analysis
                                                  • Getting the data by using readfile
                                                  • Data input from the clipboard
                                                  • Basic descriptive statistics
                                                    • Outlier detection using outlier
                                                    • Basic data cleaning using scrub
                                                    • Recoding categorical variables into dummy coded variables
                                                      • Simple descriptive graphics
                                                        • Scatter Plot Matrices
                                                        • Density or violin plots
                                                        • Means and error bars
                                                        • Error bars for tabular data
                                                        • Two dimensional displays of means and errors
                                                        • Back to back histograms
                                                        • Correlational structure
                                                        • Heatmap displays of correlational structure
                                                          • Testing correlations
                                                          • Polychoric tetrachoric polyserial and biserial correlations
                                                            • Multilevel modeling
                                                              • Decomposing data into within and between level correlations using statsBy
                                                              • Generating and displaying multilevel data
                                                              • Factor analysis by groups
                                                                • Multiple Regression mediation moderation and set correlations
                                                                  • Multiple regression from data or correlation matrices
                                                                  • Mediation and Moderation analysis
                                                                  • Set Correlation
                                                                    • Converting output to APA style tables using LaTeX
                                                                    • Miscellaneous functions
                                                                    • Data sets
                                                                    • Development version and a users guide
                                                                    • Psychometric Theory
                                                                    • SessionInfo

                                                  346 Back to back histograms

                                                  The bibars function summarize the characteristics of two groups (eg males and females)on a second variable (eg age) by drawing back to back histograms (see Figure 10)

                                                  25

                                                  data(bfi)gt png( bibarspng )

                                                  gt with(bfibibars(agegenderylab=Agemain=Age by males and females))

                                                  gt devoff()

                                                  null device

                                                  1

                                                  Figure 10 A bar plot of the age distribution for males and females shows the use ofbibars The data are males and females from 2800 cases collected using the SAPAprocedure and are available as part of the bfi data set

                                                  26

                                                  347 Correlational structure

                                                  There are many ways to display correlations Tabular displays are probably the mostcommon The output from the cor function in core R is a rectangular matrix lowerMat

                                                  will round this to (2) digits and then display as a lower off diagonal matrix lowerCor

                                                  calls cor with use=lsquopairwisersquo method=lsquopearsonrsquo as default values and returns (invisibly)the full correlation matrix and displays the lower off diagonal matrix

                                                  gt lowerCor(satact)

                                                  gendr edctn age ACT SATV SATQ

                                                  gender 100

                                                  education 009 100

                                                  age -002 055 100

                                                  ACT -004 015 011 100

                                                  SATV -002 005 -004 056 100

                                                  SATQ -017 003 -003 059 064 100

                                                  When comparing results from two different groups it is convenient to display them as onematrix with the results from one group below the diagonal and the other group above thediagonal Use lowerUpper to do this

                                                  gt female lt- subset(satactsatact$gender==2)

                                                  gt male lt- subset(satactsatact$gender==1)

                                                  gt lower lt- lowerCor(male[-1])

                                                  edctn age ACT SATV SATQ

                                                  education 100

                                                  age 061 100

                                                  ACT 016 015 100

                                                  SATV 002 -006 061 100

                                                  SATQ 008 004 060 068 100

                                                  gt upper lt- lowerCor(female[-1])

                                                  edctn age ACT SATV SATQ

                                                  education 100

                                                  age 052 100

                                                  ACT 016 008 100

                                                  SATV 007 -003 053 100

                                                  SATQ 003 -009 058 063 100

                                                  gt both lt- lowerUpper(lowerupper)

                                                  gt round(both2)

                                                  education age ACT SATV SATQ

                                                  education NA 052 016 007 003

                                                  age 061 NA 008 -003 -009

                                                  ACT 016 015 NA 053 058

                                                  SATV 002 -006 061 NA 063

                                                  SATQ 008 004 060 068 NA

                                                  It is also possible to compare two matrices by taking their differences and displaying one (be-low the diagonal) and the difference of the second from the first above the diagonal

                                                  27

                                                  gt diffs lt- lowerUpper(lowerupperdiff=TRUE)

                                                  gt round(diffs2)

                                                  education age ACT SATV SATQ

                                                  education NA 009 000 -005 005

                                                  age 061 NA 007 -003 013

                                                  ACT 016 015 NA 008 002

                                                  SATV 002 -006 061 NA 005

                                                  SATQ 008 004 060 068 NA

                                                  348 Heatmap displays of correlational structure

                                                  Perhaps a better way to see the structure in a correlation matrix is to display a heat mapof the correlations This is just a matrix color coded to represent the magnitude of thecorrelation This is useful when considering the number of factors in a data set Considerthe Thurstone data set which has a clear 3 factor solution (Figure 11) or a simulated dataset of 24 variables with a circumplex structure (Figure 12) The color coding representsa ldquoheat maprdquo of the correlations with darker shades of red representing stronger negativeand darker shades of blue stronger positive correlations As an option the value of thecorrelation can be shown

                                                  Yet another way to show structure is to use ldquospiderrdquo plots Particularly if variables areordered in some meaningful way (eg in a circumplex) a spider plot will show this structureeasily This is just a plot of the magnitude of the correlation as a radial line with lengthranging from 0 (for a correlation of -1) to 1 (for a correlation of 1) (See Figure 13)

                                                  35 Testing correlations

                                                  Correlations are wonderful descriptive statistics of the data but some people like to testwhether these correlations differ from zero or differ from each other The cortest func-tion (in the stats package) will test the significance of a single correlation and the rcorr

                                                  function in the Hmisc package will do this for many correlations In the psych packagethe corrtest function reports the correlation (Pearson Spearman or Kendall) betweenall variables in either one or two data frames or matrices as well as the number of obser-vations for each case and the (two-tailed) probability for each correlation Unfortunatelythese probability values have not been corrected for multiple comparisons and so shouldbe taken with a great deal of salt Thus in corrtest and corrp the raw probabilitiesare reported below the diagonal and the probabilities adjusted for multiple comparisonsusing (by default) the Holm correction are reported above the diagonal (Table 1) (See thepadjust function for a discussion of Holm (1979) and other corrections)

                                                  Testing the difference between any two correlations can be done using the rtest functionThe function actually does four different tests (based upon an article by Steiger (1980)

                                                  28

                                                  gt png(corplotpng)gt corPlot(Thurstonenumbers=TRUEupper=FALSEdiag=FALSEmain=9 cognitive variables from Thurstone)

                                                  gt devoff()

                                                  null device

                                                  1

                                                  Figure 11 The structure of correlation matrix can be seen more clearly if the variables aregrouped by factor and then the correlations are shown by color By using the rsquonumbersrsquooption the values are displayed as well By default the complete matrix is shown Settingupper=FALSE and diag=FALSE shows a cleaner figure

                                                  29

                                                  gt png(circplotpng)gt circ lt- simcirc(24)

                                                  gt rcirc lt- cor(circ)

                                                  gt corPlot(rcircmain=24 variables in a circumplex)gt devoff()

                                                  null device

                                                  1

                                                  Figure 12 Using the corPlot function to show the correlations in a circumplex Correlationsare highest near the diagonal diminish to zero further from the diagonal and the increaseagain towards the corners of the matrix Circumplex structures are common in the studyof affect For circumplex structures it is perhaps useful to show the complete matrix

                                                  30

                                                  gt png(spiderpng)gt oplt- par(mfrow=c(22))

                                                  gt spider(y=c(161218)x=124data=rcircfill=TRUEmain=Spider plot of 24 circumplex variables)

                                                  gt op lt- par(mfrow=c(11))

                                                  gt devoff()

                                                  null device

                                                  1

                                                  Figure 13 A spider plot can show circumplex structure very clearly Circumplex structuresare common in the study of affect

                                                  31

                                                  Table 1 The corrtest function reports correlations cell sizes and raw and adjustedprobability values corrp reports the probability values for a correlation matrix Bydefault the adjustment used is that of Holm (1979)gt corrtest(satact)

                                                  Callcorrtest(x = satact)

                                                  Correlation matrix

                                                  gender education age ACT SATV SATQ

                                                  gender 100 009 -002 -004 -002 -017

                                                  education 009 100 055 015 005 003

                                                  age -002 055 100 011 -004 -003

                                                  ACT -004 015 011 100 056 059

                                                  SATV -002 005 -004 056 100 064

                                                  SATQ -017 003 -003 059 064 100

                                                  Sample Size

                                                  gender education age ACT SATV SATQ

                                                  gender 700 700 700 700 700 687

                                                  education 700 700 700 700 700 687

                                                  age 700 700 700 700 700 687

                                                  ACT 700 700 700 700 700 687

                                                  SATV 700 700 700 700 700 687

                                                  SATQ 687 687 687 687 687 687

                                                  Probability values (Entries above the diagonal are adjusted for multiple tests)

                                                  gender education age ACT SATV SATQ

                                                  gender 000 017 100 100 1 0

                                                  education 002 000 000 000 1 1

                                                  age 058 000 000 003 1 1

                                                  ACT 033 000 000 000 0 0

                                                  SATV 062 022 026 000 0 0

                                                  SATQ 000 036 037 000 0 0

                                                  To see confidence intervals of the correlations print with the short=FALSE option

                                                  32

                                                  depending upon the input

                                                  1) For a sample size n find the t and p value for a single correlation as well as the confidenceinterval

                                                  gt rtest(503)

                                                  Correlation tests

                                                  Callrtest(n = 50 r12 = 03)

                                                  Test of significance of a correlation

                                                  t value 218 with probability lt 0034

                                                  and confidence interval 002 053

                                                  2) For sample sizes of n and n2 (n2 = n if not specified) find the z of the difference betweenthe z transformed correlations divided by the standard error of the difference of two zscores

                                                  gt rtest(3046)

                                                  Correlation tests

                                                  Callrtest(n = 30 r12 = 04 r34 = 06)

                                                  Test of difference between two independent correlations

                                                  z value 099 with probability 032

                                                  3) For sample size n and correlations ra= r12 rb= r23 and r13 specified test for thedifference of two dependent correlations (Steiger case A)

                                                  gt rtest(103451)

                                                  Correlation tests

                                                  Call[1] rtest(n = 103 r12 = 04 r23 = 01 r13 = 05 )

                                                  Test of difference between two correlated correlations

                                                  t value -089 with probability lt 037

                                                  4) For sample size n test for the difference between two dependent correlations involvingdifferent variables (Steiger case B)

                                                  gt rtest(103567558) steiger Case B

                                                  Correlation tests

                                                  Callrtest(n = 103 r12 = 05 r34 = 06 r23 = 07 r13 = 05 r14 = 05

                                                  r24 = 08)

                                                  Test of difference between two dependent correlations

                                                  z value -12 with probability 023

                                                  To test whether a matrix of correlations differs from what would be expected if the popu-lation correlations were all zero the function cortest follows Steiger (1980) who pointedout that the sum of the squared elements of a correlation matrix or the Fisher z scoreequivalents is distributed as chi square under the null hypothesis that the values are zero(ie elements of the identity matrix) This is particularly useful for examining whethercorrelations in a single matrix differ from zero or for comparing two matrices Althoughobvious cortest can be used to test whether the satact data matrix produces non-zerocorrelations (it does) This is a much more appropriate test when testing whether a residualmatrix differs from zero

                                                  gt cortest(satact)

                                                  33

                                                  Tests of correlation matrices

                                                  Callcortest(R1 = satact)

                                                  Chi Square value 132542 with df = 15 with probability lt 18e-273

                                                  36 Polychoric tetrachoric polyserial and biserial correlations

                                                  The Pearson correlation of dichotomous data is also known as the φ coefficient If thedata eg ability items are thought to represent an underlying continuous although latentvariable the φ will underestimate the value of the Pearson applied to these latent variablesOne solution to this problem is to use the tetrachoric correlation which is based uponthe assumption of a bivariate normal distribution that has been cut at certain points Thedrawtetra function demonstrates the process (Figure 14) This is also shown in termsof dichotomizing the bivariate normal density function using the drawcor function (Fig-ure 15) A simple generalization of this to the case of the multiple cuts is the polychoric

                                                  correlation

                                                  Other estimated correlations based upon the assumption of bivariate normality with cutpoints include the biserial and polyserial correlation

                                                  If the data are a mix of continuous polytomous and dichotomous variables the mixedcor

                                                  function will calculate the appropriate mixture of Pearson polychoric tetrachoric biserialand polyserial correlations

                                                  The correlation matrix resulting from a number of tetrachoric or polychoric correlationmatrix sometimes will not be positive semi-definite This will sometimes happen if thecorrelation matrix is formed by using pair-wise deletion of cases The corsmooth functionwill adjust the smallest eigen values of the correlation matrix to make them positive rescaleall of them to sum to the number of variables and produce aldquosmoothedrdquocorrelation matrixAn example of this problem is a data set of burt which probably had a typo in the originalcorrelation matrix Smoothing the matrix corrects this problem

                                                  4 Multilevel modeling

                                                  Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular use

                                                  34

                                                  gt drawtetra()

                                                  minus3 minus2 minus1 0 1 2 3

                                                  minus3

                                                  minus2

                                                  minus1

                                                  01

                                                  23

                                                  Y rho = 05phi = 033

                                                  X gt τY gt Τ

                                                  X lt τY gt Τ

                                                  X gt τY lt Τ

                                                  X lt τY lt Τ

                                                  x

                                                  dnor

                                                  m(x

                                                  )

                                                  X gt τ

                                                  τ

                                                  x1

                                                  Y gt Τ

                                                  Τ

                                                  Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values

                                                  35

                                                  gt drawcor(expand=20cuts=c(00))

                                                  xy

                                                  z

                                                  Bivariate density rho = 05

                                                  Figure 15 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values It isfound (laboriously) by optimizing the fit of the bivariate normal for various values of thecorrelation to the observed cell frequencies

                                                  36

                                                  is the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups

                                                  41 Decomposing data into within and between level correlations usingstatsBy

                                                  There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler functionto give some of the basic descriptive statistics for two level models

                                                  This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package

                                                  rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)

                                                  where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means

                                                  42 Generating and displaying multilevel data

                                                  withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a betweengroup correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data

                                                  simmultilevel will generate simulated data with a multilevel structure

                                                  The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary

                                                  function specifying the variable of interest

                                                  37

                                                  Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )

                                                  43 Factor analysis by groups

                                                  Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output

                                                  sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)

                                                  faBy(sbnfactors=5) find the 5 factor solution for each education level

                                                  5 Multiple Regression mediation moderation and set cor-relations

                                                  The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices

                                                  51 Multiple regression from data or correlation matrices

                                                  The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstonecorrelation matrix and find the multiple correlation of the last five variables as a functionof the first 4

                                                  gt setCor(y = 59x=14data=Thurstone)

                                                  Call setCor(y = 59 x = 14 data = Thurstone)

                                                  Multiple Regression from matrix input

                                                  Beta weights

                                                  FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

                                                  Sentences 009 007 025 021 020

                                                  Vocabulary 009 017 009 016 -002

                                                  SentCompletion 002 005 004 021 008

                                                  FirstLetters 058 045 021 008 031

                                                  38

                                                  Multiple R

                                                  FourLetterWords Suffixes LetterSeries Pedigrees

                                                  069 063 050 058

                                                  LetterGroup

                                                  048

                                                  multiple R2

                                                  FourLetterWords Suffixes LetterSeries Pedigrees

                                                  048 040 025 034

                                                  LetterGroup

                                                  023

                                                  Multiple Inflation Factor (VIF) = 1(1-SMC) =

                                                  Sentences Vocabulary SentCompletion FirstLetters

                                                  369 388 300 135

                                                  Unweighted multiple R

                                                  FourLetterWords Suffixes LetterSeries Pedigrees

                                                  059 058 049 058

                                                  LetterGroup

                                                  045

                                                  Unweighted multiple R2

                                                  FourLetterWords Suffixes LetterSeries Pedigrees

                                                  034 034 024 033

                                                  LetterGroup

                                                  020

                                                  Various estimates of between set correlations

                                                  Squared Canonical Correlations

                                                  [1] 06280 01478 00076 00049

                                                  Average squared canonical correlation = 02

                                                  Cohens Set Correlation R2 = 069

                                                  Unweighted correlation between the two sets = 073

                                                  By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed

                                                  gt sc lt- setCor(y = 59x=34data=Thurstonez=12)

                                                  Call setCor(y = 59 x = 34 data = Thurstone z = 12)

                                                  Multiple Regression from matrix input

                                                  Beta weights

                                                  FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

                                                  SentCompletion 002 005 004 021 008

                                                  FirstLetters 058 045 021 008 031

                                                  Multiple R

                                                  FourLetterWords Suffixes LetterSeries Pedigrees

                                                  058 046 021 018

                                                  LetterGroup

                                                  030

                                                  39

                                                  multiple R2

                                                  FourLetterWords Suffixes LetterSeries Pedigrees

                                                  0331 0210 0043 0032

                                                  LetterGroup

                                                  0092

                                                  Multiple Inflation Factor (VIF) = 1(1-SMC) =

                                                  SentCompletion FirstLetters

                                                  102 102

                                                  Unweighted multiple R

                                                  FourLetterWords Suffixes LetterSeries Pedigrees

                                                  044 035 017 014

                                                  LetterGroup

                                                  026

                                                  Unweighted multiple R2

                                                  FourLetterWords Suffixes LetterSeries Pedigrees

                                                  019 012 003 002

                                                  LetterGroup

                                                  007

                                                  Various estimates of between set correlations

                                                  Squared Canonical Correlations

                                                  [1] 0405 0023

                                                  Average squared canonical correlation = 021

                                                  Cohens Set Correlation R2 = 042

                                                  Unweighted correlation between the two sets = 048

                                                  gt round(sc$residual2)

                                                  FourLetterWords Suffixes LetterSeries Pedigrees

                                                  FourLetterWords 052 011 009 006

                                                  Suffixes 011 060 -001 001

                                                  LetterSeries 009 -001 075 028

                                                  Pedigrees 006 001 028 066

                                                  LetterGroup 013 003 037 020

                                                  LetterGroup

                                                  FourLetterWords 013

                                                  Suffixes 003

                                                  LetterSeries 037

                                                  Pedigrees 020

                                                  LetterGroup 077

                                                  52 Mediation and Moderation analysis

                                                  Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping

                                                  40

                                                  Consider the example from Preacher and Hayes (2004) as analyzed using the mediate

                                                  function and the subsequent graphic from mediatediagram The data are found in theexample for mediate

                                                  Call mediate(y = SATIS x = THERAPY m = ATTRIB data = sobel)

                                                  The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB

                                                  Total Direct effect(c) of THERAPY on SATIS = 076 SE = 031 t direct = 25 with probability = 0019

                                                  Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t direct = 135 with probability = 019

                                                  Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033

                                                  Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 004 Upper CI = 069

                                                  R2 of model = 031

                                                  To see the longer output specify short = FALSE in the print statement

                                                  Full output

                                                  Total effect estimates (c)

                                                  SATIS se t Prob

                                                  THERAPY 076 031 25 00186

                                                  Direct effect estimates (c)SATIS se t Prob

                                                  THERAPY 043 032 135 0190

                                                  ATTRIB 040 018 223 0034

                                                  a effect estimates

                                                  THERAPY se t Prob

                                                  ATTRIB 082 03 274 00106

                                                  b effect estimates

                                                  SATIS se t Prob

                                                  ATTRIB 04 018 223 0034

                                                  ab effect estimates

                                                  SATIS boot sd lower upper

                                                  THERAPY 033 032 017 004 069

                                                  bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables

                                                  setCor(y = c( SATV SATQ) x = c(education age ) data = satact std=TRUE)

                                                  bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap

                                                  mediate(y = c( SATV ) x = c(education age ) m= ACT data =satactstd=TRUEniter=50)

                                                  bull mediate will take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It then tests the mediation effect using a bootstrap The particular example is for demonstration purposes only and shows neithermoderation nor mediation The number of iterations for the boot strap was set to 50

                                                  41

                                                  gt mediatediagram(preacher)

                                                  Mediation model

                                                  THERAPY SATIS

                                                  ATTRIB

                                                  082

                                                  c = 076

                                                  c = 043

                                                  04

                                                  Figure 16 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram

                                                  42

                                                  gt preacher lt- setCor(1c(23)sobelstd=FALSE)

                                                  gt setCordiagram(preacher)

                                                  Regression Models

                                                  THERAPY

                                                  ATTRIB

                                                  SATIS

                                                  043

                                                  04

                                                  021

                                                  Figure 17 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure

                                                  43

                                                  for speed The default number of boot straps is 5000

                                                  53 Set Correlation

                                                  An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor

                                                  function Set correlation is

                                                  R2 = 1minusn

                                                  prodi=1

                                                  (1minusλi)

                                                  where λi is the ith eigen value of the eigen value decomposition of the matrix

                                                  R = Rminus1xx RxyRminus1

                                                  xx Rminus1xy

                                                  Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate

                                                  setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data

                                                  Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things tonotice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied

                                                  For this example the analysis is done on the correlation matrix rather than the rawdata

                                                  gt C lt- cov(satactuse=pairwise)

                                                  gt model1 lt- lm(ACT~ gender + education + age data=satact)

                                                  gt summary(model1)

                                                  Call

                                                  lm(formula = ACT ~ gender + education + age data = satact)

                                                  Residuals

                                                  44

                                                  Call mediate(y = c(SATQ) x = c(ACT) m = education data = satact

                                                  mod = gender niter = 50 std = TRUE)

                                                  The DV (Y) was SATQ The IV (X) was ACT gender ACTXgndr The mediating variable(s) = education

                                                  Total Direct effect(c) of ACT on SATQ = 058 SE = 003 t direct = 1925 with probability = 0

                                                  Direct effect (c) of ACT on SATQ removing education = 059 SE = 003 t direct = 1926 with probability = 0

                                                  Indirect effect (ab) of ACT on SATQ through education = -001

                                                  Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -002 Upper CI = 0

                                                  Total Direct effect(c) of gender on SATQ = -014 SE = 003 t direct = -478 with probability = 21e-06

                                                  Direct effect (c) of gender on NA removing education = -014 SE = 003 t direct = -463 with probability = 44e-06

                                                  Indirect effect (ab) of gender on SATQ through education = 0

                                                  Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -001 Upper CI = 0

                                                  Total Direct effect(c) of ACTXgndr on SATQ = 0 SE = 003 t direct = 002 with probability = 099

                                                  Direct effect (c) of ACTXgndr on NA removing education = 0 SE = 003 t direct = 001 with probability = 099

                                                  Indirect effect (ab) of ACTXgndr on SATQ through education = 0

                                                  Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = 0 Upper CI = 0

                                                  R2 of model = 037

                                                  To see the longer output specify short = FALSE in the print statement

                                                  Full output

                                                  Total effect estimates (c)

                                                  SATQ se t Prob

                                                  ACT 058 003 1925 000e+00

                                                  gender -014 003 -478 210e-06

                                                  ACTXgndr 000 003 002 985e-01

                                                  Direct effect estimates (c)SATQ se t Prob

                                                  ACT 059 003 1926 000e+00

                                                  gender -014 003 -463 437e-06

                                                  ACTXgndr 000 003 001 992e-01

                                                  a effect estimates

                                                  education se t Prob

                                                  ACT 016 004 422 277e-05

                                                  gender 009 004 250 128e-02

                                                  ACTXgndr -001 004 -015 883e-01

                                                  b effect estimates

                                                  SATQ se t Prob

                                                  education -004 003 -145 0147

                                                  ab effect estimates

                                                  SATQ boot sd lower upper

                                                  ACT -001 -001 001 0 0

                                                  gender 000 000 000 0 0

                                                  ACTXgndr 000 000 000 0 0

                                                  Moderation model

                                                  ACT

                                                  gender

                                                  ACTXgndr

                                                  SATQ

                                                  education016 c = 058

                                                  c = 059

                                                  009 c = minus014

                                                  c = minus014

                                                  minus001 c = 0

                                                  c = 0

                                                  minus004

                                                  minus004

                                                  minus007

                                                  002

                                                  Figure 18 Moderated multiple regression requires the raw data

                                                  45

                                                  Min 1Q Median 3Q Max

                                                  -252458 -32133 07769 35921 92630

                                                  Coefficients

                                                  Estimate Std Error t value Pr(gt|t|)

                                                  (Intercept) 2741706 082140 33378 lt 2e-16

                                                  gender -048606 037984 -1280 020110

                                                  education 047890 015235 3143 000174

                                                  age 001623 002278 0712 047650

                                                  ---

                                                  Signif codes 0 0001 001 005 01 1

                                                  Residual standard error 4768 on 696 degrees of freedom

                                                  Multiple R-squared 00272 Adjusted R-squared 002301

                                                  F-statistic 6487 on 3 and 696 DF p-value 00002476

                                                  Compare this with the output from setCor

                                                  gt compare with sector

                                                  gt setCor(c(46)c(13)C nobs=700)

                                                  Call setCor(y = c(46) x = c(13) data = C nobs = 700)

                                                  Multiple Regression from matrix input

                                                  Beta weights

                                                  ACT SATV SATQ

                                                  gender -005 -003 -018

                                                  education 014 010 010

                                                  age 003 -010 -009

                                                  Multiple R

                                                  ACT SATV SATQ

                                                  016 010 019

                                                  multiple R2

                                                  ACT SATV SATQ

                                                  00272 00096 00359

                                                  Multiple Inflation Factor (VIF) = 1(1-SMC) =

                                                  gender education age

                                                  101 145 144

                                                  Unweighted multiple R

                                                  ACT SATV SATQ

                                                  015 005 011

                                                  Unweighted multiple R2

                                                  ACT SATV SATQ

                                                  002 000 001

                                                  SE of Beta weights

                                                  ACT SATV SATQ

                                                  gender 018 429 434

                                                  education 022 513 518

                                                  age 022 511 516

                                                  t of Beta Weights

                                                  ACT SATV SATQ

                                                  gender -027 -001 -004

                                                  education 065 002 002

                                                  46

                                                  age 015 -002 -002

                                                  Probability of t lt

                                                  ACT SATV SATQ

                                                  gender 079 099 097

                                                  education 051 098 098

                                                  age 088 098 099

                                                  Shrunken R2

                                                  ACT SATV SATQ

                                                  00230 00054 00317

                                                  Standard Error of R2

                                                  ACT SATV SATQ

                                                  00120 00073 00137

                                                  F

                                                  ACT SATV SATQ

                                                  649 226 863

                                                  Probability of F lt

                                                  ACT SATV SATQ

                                                  248e-04 808e-02 124e-05

                                                  degrees of freedom of regression

                                                  [1] 3 696

                                                  Various estimates of between set correlations

                                                  Squared Canonical Correlations

                                                  [1] 0050 0033 0008

                                                  Chisq of canonical correlations

                                                  [1] 358 231 56

                                                  Average squared canonical correlation = 003

                                                  Cohens Set Correlation R2 = 009

                                                  Shrunken Set Correlation R2 = 008

                                                  F and df of Cohens Set Correlation 726 9 168186

                                                  Unweighted correlation between the two sets = 001

                                                  Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

                                                  6 Converting output to APA style tables using LATEX

                                                  Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenient

                                                  47

                                                  LATEXoutput and finally df2latex converts a generic data frame to LATEX

                                                  An example of converting the output from fa to LATEXappears in Table 2

                                                  Table 2 fa2latexA factor analysis table from the psych package in R

                                                  Variable MR1 MR2 MR3 h2 u2 com

                                                  Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

                                                  SS loadings 264 186 15

                                                  MR1 100 059 054MR2 059 100 052MR3 054 052 100

                                                  48

                                                  7 Miscellaneous functions

                                                  A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

                                                  blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

                                                  df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

                                                  scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

                                                  cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

                                                  cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

                                                  dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

                                                  fisherz Convert a correlation to the corresponding Fisher z score

                                                  geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

                                                  ICC and cohenkappa are typically used to find the reliability for raters

                                                  headtail combines the head and tail functions to show the first and last lines of a dataset or output

                                                  topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

                                                  mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

                                                  prep finds the probability of replication for an F t or r and estimate effect size

                                                  partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

                                                  rangeCorrection will correct correlations for restriction of range

                                                  reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

                                                  49

                                                  superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

                                                  8 Data sets

                                                  A number of data sets for demonstrating psychometric techniques are included in thepsych package These include six data sets showing a hierarchical factor structure (fivecognitive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2and one from health psychology Reise) One of these (Thurstone) is used as an examplein the sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) or 13 personality inventory scores (epibfi) and14 multiple choice iq items (iqitems) The vegetables example has paired comparisonpreferences for 9 vegetables This is an example of Thurstonian scaling used by Guilford(1954) and Nunnally (1967) Other data sets include cubits peas and heights fromGalton

                                                  Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

                                                  bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

                                                  satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

                                                  epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

                                                  50

                                                  iq 14 multiple choice ability items were included as part of the Synthetic Aperture Person-ality Assessment (SAPA) web based personality assessment project The data from1000 subjects are included here as a demonstration set for scoring multiple choiceinventories and doing basic item statistics

                                                  galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to ntroduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

                                                  Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

                                                  miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

                                                  9 Development version and a users guide

                                                  The most recent development version is available as a source file at the repository main-tained at httppersonality-projectorgr That version will have removed the mostrecently discovered bugs (but perhaps introduced other yet to be discovered ones) Todownload that version go to the repository httppersonality-projectorgrsrc

                                                  contrib and wander around For a Mac this version can be installed directly using theldquoother repositoryrdquo option in the package installer For a PC the zip file for the most recentrelease has been created using the win-builder facility at CRAN The development releasefor the Mac is usually several weeks ahead of the PC development version

                                                  Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httppersonality-projectorgrpsych_manualpdf

                                                  News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

                                                  gt news(Version gt 170package=psych)

                                                  51

                                                  10 Psychometric Theory

                                                  The psych package has been developed to help psychologists do basic research Many ofthe functions were developed to supplement a book (httppersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep)More information about the use of some of the functions may be found in the book

                                                  For more extensive discussion of the use of psych in particular and R in general consulthttppersonality-projectorgrrguidehtml A short guide to R

                                                  11 SessionInfo

                                                  This document was prepared using the following settings

                                                  gt sessionInfo()

                                                  R Under development (unstable) (2017-03-05 r72309)

                                                  Platform x86_64-apple-darwin1340 (64-bit)

                                                  Running under macOS Sierra 10124

                                                  Matrix products default

                                                  BLAS LibraryFrameworksRframeworkVersions34ResourcesliblibRblas0dylib

                                                  LAPACK LibraryFrameworksRframeworkVersions34ResourcesliblibRlapackdylib

                                                  locale

                                                  [1] C

                                                  attached base packages

                                                  [1] stats graphics grDevices utils datasets methods base

                                                  other attached packages

                                                  [1] psych_17421

                                                  loaded via a namespace (and not attached)

                                                  [1] compiler_340 parallel_340 tools_340 foreign_08-67

                                                  [5] KernSmooth_223-15 nlme_31-131 mnormt_15-4 grid_340

                                                  [9] lattice_020-34

                                                  52

                                                  References

                                                  Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

                                                  Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

                                                  Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

                                                  Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

                                                  Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

                                                  Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

                                                  Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

                                                  Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

                                                  Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

                                                  Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

                                                  Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

                                                  Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

                                                  Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

                                                  Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

                                                  Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

                                                  53

                                                  Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

                                                  Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

                                                  Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

                                                  Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

                                                  Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

                                                  Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

                                                  Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

                                                  Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

                                                  Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

                                                  Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

                                                  MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

                                                  Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

                                                  McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

                                                  Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

                                                  Nunnally J C (1967) Psychometric theory McGraw-Hill New York

                                                  54

                                                  Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

                                                  3rd edition

                                                  Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

                                                  Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

                                                  Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

                                                  Revelle W (2015) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 158

                                                  Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

                                                  Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

                                                  Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

                                                  Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

                                                  Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook ofIndividual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

                                                  Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

                                                  Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

                                                  Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

                                                  Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bang

                                                  55

                                                  for the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

                                                  Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

                                                  Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

                                                  Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

                                                  Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

                                                  Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

                                                  Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

                                                  Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

                                                  Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

                                                  Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

                                                  Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

                                                  Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

                                                  56

                                                  Index

                                                  affect 14 24alpha 5 6

                                                  Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26bifactor 6biserial 13 34blockrandom 49burt 34

                                                  char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49circular statistics 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49ctv 7cubits 50

                                                  densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49diagram 8drawcor 34drawtetra 34dummycode 13

                                                  dynamite plot 19

                                                  edit 3epibfi 50error bars 19errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23

                                                  fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factor analysis 6factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49

                                                  galton 51generalized least squares 6geometricmean 49GPArotation 7guttman 6

                                                  harmonicmean 49head 49headtail 49heights 50hetdiagram 7Hmisc 28Holzinger 50

                                                  57

                                                  ICC 6 49iclust 6iclustdiagram 7Index 49introduction to psychometric theory with ap-

                                                  plications in R 7iqitems 50irtfa 6 47irt2latex 47 49

                                                  KnitR 47

                                                  lavaan 38library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27lowess 14

                                                  makekeys 14MAP 6mardia 49maximum likelihood 6mediate 4 41 42mediatediagram 41minimum residual 6mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevel 37multilevelreliability 6multiple regression 38

                                                  nfactors 6nlme 37

                                                  omega 6 7outlier 3 11 12

                                                  padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7principal axis 6psych 3 5ndash8 28 47 49ndash52

                                                  R functionaffect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50

                                                  58

                                                  densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13edit 3epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49head 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47

                                                  irt2latex 47 49library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7psych 51psych package

                                                  affect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26

                                                  59

                                                  biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49corsmooth 34cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7

                                                  fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47irt2latex 47 49lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12prep 49pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34

                                                  60

                                                  polyserial 34principal 5ndash7psych 51rtest 28rangeCorrection 49readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

                                                  rtest 28

                                                  rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50table 19tail 49tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

                                                  R package

                                                  61

                                                  ctv 7GPArotation 7Hmisc 28KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash8 28 47 49ndash52Rgraphviz 8sem 7 50stats 28Sweave 47xtable 47

                                                  rtest 28rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49Rgraphviz 8

                                                  SAPA 26 50 51satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43sem 7 50set correlation 44setcor 49setCor 4 38 41 44 46 47simmultilevel 37

                                                  spider 13stars 14stats 28StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50Sweave 47

                                                  table 19tail 49tetrachoric 6 34Thurstone 28 38 50Thurstone33 50topBottom 49

                                                  vegetables 50 51violinBy 14 18vss 5 6

                                                  weighted least squares 6withinBetween 37

                                                  xtable 47

                                                  62

                                                  • Jump starting the psych packagendasha guide for the impatient
                                                  • Psychometric functions are summarized in the second vignette
                                                  • Overview of this and related documents
                                                  • Getting started
                                                  • Basic data analysis
                                                    • Getting the data by using readfile
                                                    • Data input from the clipboard
                                                    • Basic descriptive statistics
                                                      • Outlier detection using outlier
                                                      • Basic data cleaning using scrub
                                                      • Recoding categorical variables into dummy coded variables
                                                        • Simple descriptive graphics
                                                          • Scatter Plot Matrices
                                                          • Density or violin plots
                                                          • Means and error bars
                                                          • Error bars for tabular data
                                                          • Two dimensional displays of means and errors
                                                          • Back to back histograms
                                                          • Correlational structure
                                                          • Heatmap displays of correlational structure
                                                            • Testing correlations
                                                            • Polychoric tetrachoric polyserial and biserial correlations
                                                              • Multilevel modeling
                                                                • Decomposing data into within and between level correlations using statsBy
                                                                • Generating and displaying multilevel data
                                                                • Factor analysis by groups
                                                                  • Multiple Regression mediation moderation and set correlations
                                                                    • Multiple regression from data or correlation matrices
                                                                    • Mediation and Moderation analysis
                                                                    • Set Correlation
                                                                      • Converting output to APA style tables using LaTeX
                                                                      • Miscellaneous functions
                                                                      • Data sets
                                                                      • Development version and a users guide
                                                                      • Psychometric Theory
                                                                      • SessionInfo

                                                    data(bfi)gt png( bibarspng )

                                                    gt with(bfibibars(agegenderylab=Agemain=Age by males and females))

                                                    gt devoff()

                                                    null device

                                                    1

                                                    Figure 10 A bar plot of the age distribution for males and females shows the use ofbibars The data are males and females from 2800 cases collected using the SAPAprocedure and are available as part of the bfi data set

                                                    26

                                                    347 Correlational structure

                                                    There are many ways to display correlations Tabular displays are probably the mostcommon The output from the cor function in core R is a rectangular matrix lowerMat

                                                    will round this to (2) digits and then display as a lower off diagonal matrix lowerCor

                                                    calls cor with use=lsquopairwisersquo method=lsquopearsonrsquo as default values and returns (invisibly)the full correlation matrix and displays the lower off diagonal matrix

                                                    gt lowerCor(satact)

                                                    gendr edctn age ACT SATV SATQ

                                                    gender 100

                                                    education 009 100

                                                    age -002 055 100

                                                    ACT -004 015 011 100

                                                    SATV -002 005 -004 056 100

                                                    SATQ -017 003 -003 059 064 100

                                                    When comparing results from two different groups it is convenient to display them as onematrix with the results from one group below the diagonal and the other group above thediagonal Use lowerUpper to do this

                                                    gt female lt- subset(satactsatact$gender==2)

                                                    gt male lt- subset(satactsatact$gender==1)

                                                    gt lower lt- lowerCor(male[-1])

                                                    edctn age ACT SATV SATQ

                                                    education 100

                                                    age 061 100

                                                    ACT 016 015 100

                                                    SATV 002 -006 061 100

                                                    SATQ 008 004 060 068 100

                                                    gt upper lt- lowerCor(female[-1])

                                                    edctn age ACT SATV SATQ

                                                    education 100

                                                    age 052 100

                                                    ACT 016 008 100

                                                    SATV 007 -003 053 100

                                                    SATQ 003 -009 058 063 100

                                                    gt both lt- lowerUpper(lowerupper)

                                                    gt round(both2)

                                                    education age ACT SATV SATQ

                                                    education NA 052 016 007 003

                                                    age 061 NA 008 -003 -009

                                                    ACT 016 015 NA 053 058

                                                    SATV 002 -006 061 NA 063

                                                    SATQ 008 004 060 068 NA

                                                    It is also possible to compare two matrices by taking their differences and displaying one (be-low the diagonal) and the difference of the second from the first above the diagonal

                                                    27

                                                    gt diffs lt- lowerUpper(lowerupperdiff=TRUE)

                                                    gt round(diffs2)

                                                    education age ACT SATV SATQ

                                                    education NA 009 000 -005 005

                                                    age 061 NA 007 -003 013

                                                    ACT 016 015 NA 008 002

                                                    SATV 002 -006 061 NA 005

                                                    SATQ 008 004 060 068 NA

                                                    348 Heatmap displays of correlational structure

                                                    Perhaps a better way to see the structure in a correlation matrix is to display a heat mapof the correlations This is just a matrix color coded to represent the magnitude of thecorrelation This is useful when considering the number of factors in a data set Considerthe Thurstone data set which has a clear 3 factor solution (Figure 11) or a simulated dataset of 24 variables with a circumplex structure (Figure 12) The color coding representsa ldquoheat maprdquo of the correlations with darker shades of red representing stronger negativeand darker shades of blue stronger positive correlations As an option the value of thecorrelation can be shown

                                                    Yet another way to show structure is to use ldquospiderrdquo plots Particularly if variables areordered in some meaningful way (eg in a circumplex) a spider plot will show this structureeasily This is just a plot of the magnitude of the correlation as a radial line with lengthranging from 0 (for a correlation of -1) to 1 (for a correlation of 1) (See Figure 13)

                                                    35 Testing correlations

                                                    Correlations are wonderful descriptive statistics of the data but some people like to testwhether these correlations differ from zero or differ from each other The cortest func-tion (in the stats package) will test the significance of a single correlation and the rcorr

                                                    function in the Hmisc package will do this for many correlations In the psych packagethe corrtest function reports the correlation (Pearson Spearman or Kendall) betweenall variables in either one or two data frames or matrices as well as the number of obser-vations for each case and the (two-tailed) probability for each correlation Unfortunatelythese probability values have not been corrected for multiple comparisons and so shouldbe taken with a great deal of salt Thus in corrtest and corrp the raw probabilitiesare reported below the diagonal and the probabilities adjusted for multiple comparisonsusing (by default) the Holm correction are reported above the diagonal (Table 1) (See thepadjust function for a discussion of Holm (1979) and other corrections)

                                                    Testing the difference between any two correlations can be done using the rtest functionThe function actually does four different tests (based upon an article by Steiger (1980)

                                                    28

                                                    gt png(corplotpng)gt corPlot(Thurstonenumbers=TRUEupper=FALSEdiag=FALSEmain=9 cognitive variables from Thurstone)

                                                    gt devoff()

                                                    null device

                                                    1

                                                    Figure 11 The structure of correlation matrix can be seen more clearly if the variables aregrouped by factor and then the correlations are shown by color By using the rsquonumbersrsquooption the values are displayed as well By default the complete matrix is shown Settingupper=FALSE and diag=FALSE shows a cleaner figure

                                                    29

                                                    gt png(circplotpng)gt circ lt- simcirc(24)

                                                    gt rcirc lt- cor(circ)

                                                    gt corPlot(rcircmain=24 variables in a circumplex)gt devoff()

                                                    null device

                                                    1

                                                    Figure 12 Using the corPlot function to show the correlations in a circumplex Correlationsare highest near the diagonal diminish to zero further from the diagonal and the increaseagain towards the corners of the matrix Circumplex structures are common in the studyof affect For circumplex structures it is perhaps useful to show the complete matrix

                                                    30

                                                    gt png(spiderpng)gt oplt- par(mfrow=c(22))

                                                    gt spider(y=c(161218)x=124data=rcircfill=TRUEmain=Spider plot of 24 circumplex variables)

                                                    gt op lt- par(mfrow=c(11))

                                                    gt devoff()

                                                    null device

                                                    1

                                                    Figure 13 A spider plot can show circumplex structure very clearly Circumplex structuresare common in the study of affect

                                                    31

                                                    Table 1 The corrtest function reports correlations cell sizes and raw and adjustedprobability values corrp reports the probability values for a correlation matrix Bydefault the adjustment used is that of Holm (1979)gt corrtest(satact)

                                                    Callcorrtest(x = satact)

                                                    Correlation matrix

                                                    gender education age ACT SATV SATQ

                                                    gender 100 009 -002 -004 -002 -017

                                                    education 009 100 055 015 005 003

                                                    age -002 055 100 011 -004 -003

                                                    ACT -004 015 011 100 056 059

                                                    SATV -002 005 -004 056 100 064

                                                    SATQ -017 003 -003 059 064 100

                                                    Sample Size

                                                    gender education age ACT SATV SATQ

                                                    gender 700 700 700 700 700 687

                                                    education 700 700 700 700 700 687

                                                    age 700 700 700 700 700 687

                                                    ACT 700 700 700 700 700 687

                                                    SATV 700 700 700 700 700 687

                                                    SATQ 687 687 687 687 687 687

                                                    Probability values (Entries above the diagonal are adjusted for multiple tests)

                                                    gender education age ACT SATV SATQ

                                                    gender 000 017 100 100 1 0

                                                    education 002 000 000 000 1 1

                                                    age 058 000 000 003 1 1

                                                    ACT 033 000 000 000 0 0

                                                    SATV 062 022 026 000 0 0

                                                    SATQ 000 036 037 000 0 0

                                                    To see confidence intervals of the correlations print with the short=FALSE option

                                                    32

                                                    depending upon the input

                                                    1) For a sample size n find the t and p value for a single correlation as well as the confidenceinterval

                                                    gt rtest(503)

                                                    Correlation tests

                                                    Callrtest(n = 50 r12 = 03)

                                                    Test of significance of a correlation

                                                    t value 218 with probability lt 0034

                                                    and confidence interval 002 053

                                                    2) For sample sizes of n and n2 (n2 = n if not specified) find the z of the difference betweenthe z transformed correlations divided by the standard error of the difference of two zscores

                                                    gt rtest(3046)

                                                    Correlation tests

                                                    Callrtest(n = 30 r12 = 04 r34 = 06)

                                                    Test of difference between two independent correlations

                                                    z value 099 with probability 032

                                                    3) For sample size n and correlations ra= r12 rb= r23 and r13 specified test for thedifference of two dependent correlations (Steiger case A)

                                                    gt rtest(103451)

                                                    Correlation tests

                                                    Call[1] rtest(n = 103 r12 = 04 r23 = 01 r13 = 05 )

                                                    Test of difference between two correlated correlations

                                                    t value -089 with probability lt 037

                                                    4) For sample size n test for the difference between two dependent correlations involvingdifferent variables (Steiger case B)

                                                    gt rtest(103567558) steiger Case B

                                                    Correlation tests

                                                    Callrtest(n = 103 r12 = 05 r34 = 06 r23 = 07 r13 = 05 r14 = 05

                                                    r24 = 08)

                                                    Test of difference between two dependent correlations

                                                    z value -12 with probability 023

                                                    To test whether a matrix of correlations differs from what would be expected if the popu-lation correlations were all zero the function cortest follows Steiger (1980) who pointedout that the sum of the squared elements of a correlation matrix or the Fisher z scoreequivalents is distributed as chi square under the null hypothesis that the values are zero(ie elements of the identity matrix) This is particularly useful for examining whethercorrelations in a single matrix differ from zero or for comparing two matrices Althoughobvious cortest can be used to test whether the satact data matrix produces non-zerocorrelations (it does) This is a much more appropriate test when testing whether a residualmatrix differs from zero

                                                    gt cortest(satact)

                                                    33

                                                    Tests of correlation matrices

                                                    Callcortest(R1 = satact)

                                                    Chi Square value 132542 with df = 15 with probability lt 18e-273

                                                    36 Polychoric tetrachoric polyserial and biserial correlations

                                                    The Pearson correlation of dichotomous data is also known as the φ coefficient If thedata eg ability items are thought to represent an underlying continuous although latentvariable the φ will underestimate the value of the Pearson applied to these latent variablesOne solution to this problem is to use the tetrachoric correlation which is based uponthe assumption of a bivariate normal distribution that has been cut at certain points Thedrawtetra function demonstrates the process (Figure 14) This is also shown in termsof dichotomizing the bivariate normal density function using the drawcor function (Fig-ure 15) A simple generalization of this to the case of the multiple cuts is the polychoric

                                                    correlation

                                                    Other estimated correlations based upon the assumption of bivariate normality with cutpoints include the biserial and polyserial correlation

                                                    If the data are a mix of continuous polytomous and dichotomous variables the mixedcor

                                                    function will calculate the appropriate mixture of Pearson polychoric tetrachoric biserialand polyserial correlations

                                                    The correlation matrix resulting from a number of tetrachoric or polychoric correlationmatrix sometimes will not be positive semi-definite This will sometimes happen if thecorrelation matrix is formed by using pair-wise deletion of cases The corsmooth functionwill adjust the smallest eigen values of the correlation matrix to make them positive rescaleall of them to sum to the number of variables and produce aldquosmoothedrdquocorrelation matrixAn example of this problem is a data set of burt which probably had a typo in the originalcorrelation matrix Smoothing the matrix corrects this problem

                                                    4 Multilevel modeling

                                                    Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular use

                                                    34

                                                    gt drawtetra()

                                                    minus3 minus2 minus1 0 1 2 3

                                                    minus3

                                                    minus2

                                                    minus1

                                                    01

                                                    23

                                                    Y rho = 05phi = 033

                                                    X gt τY gt Τ

                                                    X lt τY gt Τ

                                                    X gt τY lt Τ

                                                    X lt τY lt Τ

                                                    x

                                                    dnor

                                                    m(x

                                                    )

                                                    X gt τ

                                                    τ

                                                    x1

                                                    Y gt Τ

                                                    Τ

                                                    Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values

                                                    35

                                                    gt drawcor(expand=20cuts=c(00))

                                                    xy

                                                    z

                                                    Bivariate density rho = 05

                                                    Figure 15 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values It isfound (laboriously) by optimizing the fit of the bivariate normal for various values of thecorrelation to the observed cell frequencies

                                                    36

                                                    is the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups

                                                    41 Decomposing data into within and between level correlations usingstatsBy

                                                    There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler functionto give some of the basic descriptive statistics for two level models

                                                    This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package

                                                    rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)

                                                    where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means

                                                    42 Generating and displaying multilevel data

                                                    withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a betweengroup correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data

                                                    simmultilevel will generate simulated data with a multilevel structure

                                                    The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary

                                                    function specifying the variable of interest

                                                    37

                                                    Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )

                                                    43 Factor analysis by groups

                                                    Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output

                                                    sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)

                                                    faBy(sbnfactors=5) find the 5 factor solution for each education level

                                                    5 Multiple Regression mediation moderation and set cor-relations

                                                    The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices

                                                    51 Multiple regression from data or correlation matrices

                                                    The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstonecorrelation matrix and find the multiple correlation of the last five variables as a functionof the first 4

                                                    gt setCor(y = 59x=14data=Thurstone)

                                                    Call setCor(y = 59 x = 14 data = Thurstone)

                                                    Multiple Regression from matrix input

                                                    Beta weights

                                                    FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

                                                    Sentences 009 007 025 021 020

                                                    Vocabulary 009 017 009 016 -002

                                                    SentCompletion 002 005 004 021 008

                                                    FirstLetters 058 045 021 008 031

                                                    38

                                                    Multiple R

                                                    FourLetterWords Suffixes LetterSeries Pedigrees

                                                    069 063 050 058

                                                    LetterGroup

                                                    048

                                                    multiple R2

                                                    FourLetterWords Suffixes LetterSeries Pedigrees

                                                    048 040 025 034

                                                    LetterGroup

                                                    023

                                                    Multiple Inflation Factor (VIF) = 1(1-SMC) =

                                                    Sentences Vocabulary SentCompletion FirstLetters

                                                    369 388 300 135

                                                    Unweighted multiple R

                                                    FourLetterWords Suffixes LetterSeries Pedigrees

                                                    059 058 049 058

                                                    LetterGroup

                                                    045

                                                    Unweighted multiple R2

                                                    FourLetterWords Suffixes LetterSeries Pedigrees

                                                    034 034 024 033

                                                    LetterGroup

                                                    020

                                                    Various estimates of between set correlations

                                                    Squared Canonical Correlations

                                                    [1] 06280 01478 00076 00049

                                                    Average squared canonical correlation = 02

                                                    Cohens Set Correlation R2 = 069

                                                    Unweighted correlation between the two sets = 073

                                                    By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed

                                                    gt sc lt- setCor(y = 59x=34data=Thurstonez=12)

                                                    Call setCor(y = 59 x = 34 data = Thurstone z = 12)

                                                    Multiple Regression from matrix input

                                                    Beta weights

                                                    FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

                                                    SentCompletion 002 005 004 021 008

                                                    FirstLetters 058 045 021 008 031

                                                    Multiple R

                                                    FourLetterWords Suffixes LetterSeries Pedigrees

                                                    058 046 021 018

                                                    LetterGroup

                                                    030

                                                    39

                                                    multiple R2

                                                    FourLetterWords Suffixes LetterSeries Pedigrees

                                                    0331 0210 0043 0032

                                                    LetterGroup

                                                    0092

                                                    Multiple Inflation Factor (VIF) = 1(1-SMC) =

                                                    SentCompletion FirstLetters

                                                    102 102

                                                    Unweighted multiple R

                                                    FourLetterWords Suffixes LetterSeries Pedigrees

                                                    044 035 017 014

                                                    LetterGroup

                                                    026

                                                    Unweighted multiple R2

                                                    FourLetterWords Suffixes LetterSeries Pedigrees

                                                    019 012 003 002

                                                    LetterGroup

                                                    007

                                                    Various estimates of between set correlations

                                                    Squared Canonical Correlations

                                                    [1] 0405 0023

                                                    Average squared canonical correlation = 021

                                                    Cohens Set Correlation R2 = 042

                                                    Unweighted correlation between the two sets = 048

                                                    gt round(sc$residual2)

                                                    FourLetterWords Suffixes LetterSeries Pedigrees

                                                    FourLetterWords 052 011 009 006

                                                    Suffixes 011 060 -001 001

                                                    LetterSeries 009 -001 075 028

                                                    Pedigrees 006 001 028 066

                                                    LetterGroup 013 003 037 020

                                                    LetterGroup

                                                    FourLetterWords 013

                                                    Suffixes 003

                                                    LetterSeries 037

                                                    Pedigrees 020

                                                    LetterGroup 077

                                                    52 Mediation and Moderation analysis

                                                    Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping

                                                    40

                                                    Consider the example from Preacher and Hayes (2004) as analyzed using the mediate

                                                    function and the subsequent graphic from mediatediagram The data are found in theexample for mediate

                                                    Call mediate(y = SATIS x = THERAPY m = ATTRIB data = sobel)

                                                    The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB

                                                    Total Direct effect(c) of THERAPY on SATIS = 076 SE = 031 t direct = 25 with probability = 0019

                                                    Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t direct = 135 with probability = 019

                                                    Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033

                                                    Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 004 Upper CI = 069

                                                    R2 of model = 031

                                                    To see the longer output specify short = FALSE in the print statement

                                                    Full output

                                                    Total effect estimates (c)

                                                    SATIS se t Prob

                                                    THERAPY 076 031 25 00186

                                                    Direct effect estimates (c)SATIS se t Prob

                                                    THERAPY 043 032 135 0190

                                                    ATTRIB 040 018 223 0034

                                                    a effect estimates

                                                    THERAPY se t Prob

                                                    ATTRIB 082 03 274 00106

                                                    b effect estimates

                                                    SATIS se t Prob

                                                    ATTRIB 04 018 223 0034

                                                    ab effect estimates

                                                    SATIS boot sd lower upper

                                                    THERAPY 033 032 017 004 069

                                                    bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables

                                                    setCor(y = c( SATV SATQ) x = c(education age ) data = satact std=TRUE)

                                                    bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap

                                                    mediate(y = c( SATV ) x = c(education age ) m= ACT data =satactstd=TRUEniter=50)

                                                    bull mediate will take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It then tests the mediation effect using a bootstrap The particular example is for demonstration purposes only and shows neithermoderation nor mediation The number of iterations for the boot strap was set to 50

                                                    41

                                                    gt mediatediagram(preacher)

                                                    Mediation model

                                                    THERAPY SATIS

                                                    ATTRIB

                                                    082

                                                    c = 076

                                                    c = 043

                                                    04

                                                    Figure 16 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram

                                                    42

                                                    gt preacher lt- setCor(1c(23)sobelstd=FALSE)

                                                    gt setCordiagram(preacher)

                                                    Regression Models

                                                    THERAPY

                                                    ATTRIB

                                                    SATIS

                                                    043

                                                    04

                                                    021

                                                    Figure 17 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure

                                                    43

                                                    for speed The default number of boot straps is 5000

                                                    53 Set Correlation

                                                    An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor

                                                    function Set correlation is

                                                    R2 = 1minusn

                                                    prodi=1

                                                    (1minusλi)

                                                    where λi is the ith eigen value of the eigen value decomposition of the matrix

                                                    R = Rminus1xx RxyRminus1

                                                    xx Rminus1xy

                                                    Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate

                                                    setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data

                                                    Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things tonotice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied

                                                    For this example the analysis is done on the correlation matrix rather than the rawdata

                                                    gt C lt- cov(satactuse=pairwise)

                                                    gt model1 lt- lm(ACT~ gender + education + age data=satact)

                                                    gt summary(model1)

                                                    Call

                                                    lm(formula = ACT ~ gender + education + age data = satact)

                                                    Residuals

                                                    44

                                                    Call mediate(y = c(SATQ) x = c(ACT) m = education data = satact

                                                    mod = gender niter = 50 std = TRUE)

                                                    The DV (Y) was SATQ The IV (X) was ACT gender ACTXgndr The mediating variable(s) = education

                                                    Total Direct effect(c) of ACT on SATQ = 058 SE = 003 t direct = 1925 with probability = 0

                                                    Direct effect (c) of ACT on SATQ removing education = 059 SE = 003 t direct = 1926 with probability = 0

                                                    Indirect effect (ab) of ACT on SATQ through education = -001

                                                    Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -002 Upper CI = 0

                                                    Total Direct effect(c) of gender on SATQ = -014 SE = 003 t direct = -478 with probability = 21e-06

                                                    Direct effect (c) of gender on NA removing education = -014 SE = 003 t direct = -463 with probability = 44e-06

                                                    Indirect effect (ab) of gender on SATQ through education = 0

                                                    Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -001 Upper CI = 0

                                                    Total Direct effect(c) of ACTXgndr on SATQ = 0 SE = 003 t direct = 002 with probability = 099

                                                    Direct effect (c) of ACTXgndr on NA removing education = 0 SE = 003 t direct = 001 with probability = 099

                                                    Indirect effect (ab) of ACTXgndr on SATQ through education = 0

                                                    Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = 0 Upper CI = 0

                                                    R2 of model = 037

                                                    To see the longer output specify short = FALSE in the print statement

                                                    Full output

                                                    Total effect estimates (c)

                                                    SATQ se t Prob

                                                    ACT 058 003 1925 000e+00

                                                    gender -014 003 -478 210e-06

                                                    ACTXgndr 000 003 002 985e-01

                                                    Direct effect estimates (c)SATQ se t Prob

                                                    ACT 059 003 1926 000e+00

                                                    gender -014 003 -463 437e-06

                                                    ACTXgndr 000 003 001 992e-01

                                                    a effect estimates

                                                    education se t Prob

                                                    ACT 016 004 422 277e-05

                                                    gender 009 004 250 128e-02

                                                    ACTXgndr -001 004 -015 883e-01

                                                    b effect estimates

                                                    SATQ se t Prob

                                                    education -004 003 -145 0147

                                                    ab effect estimates

                                                    SATQ boot sd lower upper

                                                    ACT -001 -001 001 0 0

                                                    gender 000 000 000 0 0

                                                    ACTXgndr 000 000 000 0 0

                                                    Moderation model

                                                    ACT

                                                    gender

                                                    ACTXgndr

                                                    SATQ

                                                    education016 c = 058

                                                    c = 059

                                                    009 c = minus014

                                                    c = minus014

                                                    minus001 c = 0

                                                    c = 0

                                                    minus004

                                                    minus004

                                                    minus007

                                                    002

                                                    Figure 18 Moderated multiple regression requires the raw data

                                                    45

                                                    Min 1Q Median 3Q Max

                                                    -252458 -32133 07769 35921 92630

                                                    Coefficients

                                                    Estimate Std Error t value Pr(gt|t|)

                                                    (Intercept) 2741706 082140 33378 lt 2e-16

                                                    gender -048606 037984 -1280 020110

                                                    education 047890 015235 3143 000174

                                                    age 001623 002278 0712 047650

                                                    ---

                                                    Signif codes 0 0001 001 005 01 1

                                                    Residual standard error 4768 on 696 degrees of freedom

                                                    Multiple R-squared 00272 Adjusted R-squared 002301

                                                    F-statistic 6487 on 3 and 696 DF p-value 00002476

                                                    Compare this with the output from setCor

                                                    gt compare with sector

                                                    gt setCor(c(46)c(13)C nobs=700)

                                                    Call setCor(y = c(46) x = c(13) data = C nobs = 700)

                                                    Multiple Regression from matrix input

                                                    Beta weights

                                                    ACT SATV SATQ

                                                    gender -005 -003 -018

                                                    education 014 010 010

                                                    age 003 -010 -009

                                                    Multiple R

                                                    ACT SATV SATQ

                                                    016 010 019

                                                    multiple R2

                                                    ACT SATV SATQ

                                                    00272 00096 00359

                                                    Multiple Inflation Factor (VIF) = 1(1-SMC) =

                                                    gender education age

                                                    101 145 144

                                                    Unweighted multiple R

                                                    ACT SATV SATQ

                                                    015 005 011

                                                    Unweighted multiple R2

                                                    ACT SATV SATQ

                                                    002 000 001

                                                    SE of Beta weights

                                                    ACT SATV SATQ

                                                    gender 018 429 434

                                                    education 022 513 518

                                                    age 022 511 516

                                                    t of Beta Weights

                                                    ACT SATV SATQ

                                                    gender -027 -001 -004

                                                    education 065 002 002

                                                    46

                                                    age 015 -002 -002

                                                    Probability of t lt

                                                    ACT SATV SATQ

                                                    gender 079 099 097

                                                    education 051 098 098

                                                    age 088 098 099

                                                    Shrunken R2

                                                    ACT SATV SATQ

                                                    00230 00054 00317

                                                    Standard Error of R2

                                                    ACT SATV SATQ

                                                    00120 00073 00137

                                                    F

                                                    ACT SATV SATQ

                                                    649 226 863

                                                    Probability of F lt

                                                    ACT SATV SATQ

                                                    248e-04 808e-02 124e-05

                                                    degrees of freedom of regression

                                                    [1] 3 696

                                                    Various estimates of between set correlations

                                                    Squared Canonical Correlations

                                                    [1] 0050 0033 0008

                                                    Chisq of canonical correlations

                                                    [1] 358 231 56

                                                    Average squared canonical correlation = 003

                                                    Cohens Set Correlation R2 = 009

                                                    Shrunken Set Correlation R2 = 008

                                                    F and df of Cohens Set Correlation 726 9 168186

                                                    Unweighted correlation between the two sets = 001

                                                    Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

                                                    6 Converting output to APA style tables using LATEX

                                                    Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenient

                                                    47

                                                    LATEXoutput and finally df2latex converts a generic data frame to LATEX

                                                    An example of converting the output from fa to LATEXappears in Table 2

                                                    Table 2 fa2latexA factor analysis table from the psych package in R

                                                    Variable MR1 MR2 MR3 h2 u2 com

                                                    Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

                                                    SS loadings 264 186 15

                                                    MR1 100 059 054MR2 059 100 052MR3 054 052 100

                                                    48

                                                    7 Miscellaneous functions

                                                    A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

                                                    blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

                                                    df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

                                                    scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

                                                    cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

                                                    cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

                                                    dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

                                                    fisherz Convert a correlation to the corresponding Fisher z score

                                                    geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

                                                    ICC and cohenkappa are typically used to find the reliability for raters

                                                    headtail combines the head and tail functions to show the first and last lines of a dataset or output

                                                    topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

                                                    mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

                                                    prep finds the probability of replication for an F t or r and estimate effect size

                                                    partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

                                                    rangeCorrection will correct correlations for restriction of range

                                                    reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

                                                    49

                                                    superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

                                                    8 Data sets

                                                    A number of data sets for demonstrating psychometric techniques are included in thepsych package These include six data sets showing a hierarchical factor structure (fivecognitive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2and one from health psychology Reise) One of these (Thurstone) is used as an examplein the sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) or 13 personality inventory scores (epibfi) and14 multiple choice iq items (iqitems) The vegetables example has paired comparisonpreferences for 9 vegetables This is an example of Thurstonian scaling used by Guilford(1954) and Nunnally (1967) Other data sets include cubits peas and heights fromGalton

                                                    Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

                                                    bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

                                                    satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

                                                    epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

                                                    50

                                                    iq 14 multiple choice ability items were included as part of the Synthetic Aperture Person-ality Assessment (SAPA) web based personality assessment project The data from1000 subjects are included here as a demonstration set for scoring multiple choiceinventories and doing basic item statistics

                                                    galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to ntroduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

                                                    Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

                                                    miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

                                                    9 Development version and a users guide

                                                    The most recent development version is available as a source file at the repository main-tained at httppersonality-projectorgr That version will have removed the mostrecently discovered bugs (but perhaps introduced other yet to be discovered ones) Todownload that version go to the repository httppersonality-projectorgrsrc

                                                    contrib and wander around For a Mac this version can be installed directly using theldquoother repositoryrdquo option in the package installer For a PC the zip file for the most recentrelease has been created using the win-builder facility at CRAN The development releasefor the Mac is usually several weeks ahead of the PC development version

                                                    Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httppersonality-projectorgrpsych_manualpdf

                                                    News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

                                                    gt news(Version gt 170package=psych)

                                                    51

                                                    10 Psychometric Theory

                                                    The psych package has been developed to help psychologists do basic research Many ofthe functions were developed to supplement a book (httppersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep)More information about the use of some of the functions may be found in the book

                                                    For more extensive discussion of the use of psych in particular and R in general consulthttppersonality-projectorgrrguidehtml A short guide to R

                                                    11 SessionInfo

                                                    This document was prepared using the following settings

                                                    gt sessionInfo()

                                                    R Under development (unstable) (2017-03-05 r72309)

                                                    Platform x86_64-apple-darwin1340 (64-bit)

                                                    Running under macOS Sierra 10124

                                                    Matrix products default

                                                    BLAS LibraryFrameworksRframeworkVersions34ResourcesliblibRblas0dylib

                                                    LAPACK LibraryFrameworksRframeworkVersions34ResourcesliblibRlapackdylib

                                                    locale

                                                    [1] C

                                                    attached base packages

                                                    [1] stats graphics grDevices utils datasets methods base

                                                    other attached packages

                                                    [1] psych_17421

                                                    loaded via a namespace (and not attached)

                                                    [1] compiler_340 parallel_340 tools_340 foreign_08-67

                                                    [5] KernSmooth_223-15 nlme_31-131 mnormt_15-4 grid_340

                                                    [9] lattice_020-34

                                                    52

                                                    References

                                                    Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

                                                    Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

                                                    Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

                                                    Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

                                                    Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

                                                    Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

                                                    Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

                                                    Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

                                                    Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

                                                    Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

                                                    Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

                                                    Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

                                                    Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

                                                    Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

                                                    Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

                                                    53

                                                    Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

                                                    Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

                                                    Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

                                                    Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

                                                    Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

                                                    Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

                                                    Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

                                                    Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

                                                    Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

                                                    Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

                                                    MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

                                                    Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

                                                    McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

                                                    Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

                                                    Nunnally J C (1967) Psychometric theory McGraw-Hill New York

                                                    54

                                                    Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

                                                    3rd edition

                                                    Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

                                                    Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

                                                    Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

                                                    Revelle W (2015) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 158

                                                    Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

                                                    Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

                                                    Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

                                                    Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

                                                    Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook ofIndividual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

                                                    Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

                                                    Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

                                                    Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

                                                    Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bang

                                                    55

                                                    for the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

                                                    Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

                                                    Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

                                                    Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

                                                    Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

                                                    Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

                                                    Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

                                                    Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

                                                    Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

                                                    Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

                                                    Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

                                                    Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

                                                    56

                                                    Index

                                                    affect 14 24alpha 5 6

                                                    Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26bifactor 6biserial 13 34blockrandom 49burt 34

                                                    char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49circular statistics 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49ctv 7cubits 50

                                                    densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49diagram 8drawcor 34drawtetra 34dummycode 13

                                                    dynamite plot 19

                                                    edit 3epibfi 50error bars 19errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23

                                                    fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factor analysis 6factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49

                                                    galton 51generalized least squares 6geometricmean 49GPArotation 7guttman 6

                                                    harmonicmean 49head 49headtail 49heights 50hetdiagram 7Hmisc 28Holzinger 50

                                                    57

                                                    ICC 6 49iclust 6iclustdiagram 7Index 49introduction to psychometric theory with ap-

                                                    plications in R 7iqitems 50irtfa 6 47irt2latex 47 49

                                                    KnitR 47

                                                    lavaan 38library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27lowess 14

                                                    makekeys 14MAP 6mardia 49maximum likelihood 6mediate 4 41 42mediatediagram 41minimum residual 6mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevel 37multilevelreliability 6multiple regression 38

                                                    nfactors 6nlme 37

                                                    omega 6 7outlier 3 11 12

                                                    padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7principal axis 6psych 3 5ndash8 28 47 49ndash52

                                                    R functionaffect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50

                                                    58

                                                    densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13edit 3epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49head 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47

                                                    irt2latex 47 49library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7psych 51psych package

                                                    affect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26

                                                    59

                                                    biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49corsmooth 34cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7

                                                    fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47irt2latex 47 49lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12prep 49pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34

                                                    60

                                                    polyserial 34principal 5ndash7psych 51rtest 28rangeCorrection 49readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

                                                    rtest 28

                                                    rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50table 19tail 49tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

                                                    R package

                                                    61

                                                    ctv 7GPArotation 7Hmisc 28KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash8 28 47 49ndash52Rgraphviz 8sem 7 50stats 28Sweave 47xtable 47

                                                    rtest 28rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49Rgraphviz 8

                                                    SAPA 26 50 51satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43sem 7 50set correlation 44setcor 49setCor 4 38 41 44 46 47simmultilevel 37

                                                    spider 13stars 14stats 28StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50Sweave 47

                                                    table 19tail 49tetrachoric 6 34Thurstone 28 38 50Thurstone33 50topBottom 49

                                                    vegetables 50 51violinBy 14 18vss 5 6

                                                    weighted least squares 6withinBetween 37

                                                    xtable 47

                                                    62

                                                    • Jump starting the psych packagendasha guide for the impatient
                                                    • Psychometric functions are summarized in the second vignette
                                                    • Overview of this and related documents
                                                    • Getting started
                                                    • Basic data analysis
                                                      • Getting the data by using readfile
                                                      • Data input from the clipboard
                                                      • Basic descriptive statistics
                                                        • Outlier detection using outlier
                                                        • Basic data cleaning using scrub
                                                        • Recoding categorical variables into dummy coded variables
                                                          • Simple descriptive graphics
                                                            • Scatter Plot Matrices
                                                            • Density or violin plots
                                                            • Means and error bars
                                                            • Error bars for tabular data
                                                            • Two dimensional displays of means and errors
                                                            • Back to back histograms
                                                            • Correlational structure
                                                            • Heatmap displays of correlational structure
                                                              • Testing correlations
                                                              • Polychoric tetrachoric polyserial and biserial correlations
                                                                • Multilevel modeling
                                                                  • Decomposing data into within and between level correlations using statsBy
                                                                  • Generating and displaying multilevel data
                                                                  • Factor analysis by groups
                                                                    • Multiple Regression mediation moderation and set correlations
                                                                      • Multiple regression from data or correlation matrices
                                                                      • Mediation and Moderation analysis
                                                                      • Set Correlation
                                                                        • Converting output to APA style tables using LaTeX
                                                                        • Miscellaneous functions
                                                                        • Data sets
                                                                        • Development version and a users guide
                                                                        • Psychometric Theory
                                                                        • SessionInfo

                                                      347 Correlational structure

                                                      There are many ways to display correlations Tabular displays are probably the mostcommon The output from the cor function in core R is a rectangular matrix lowerMat

                                                      will round this to (2) digits and then display as a lower off diagonal matrix lowerCor

                                                      calls cor with use=lsquopairwisersquo method=lsquopearsonrsquo as default values and returns (invisibly)the full correlation matrix and displays the lower off diagonal matrix

                                                      gt lowerCor(satact)

                                                      gendr edctn age ACT SATV SATQ

                                                      gender 100

                                                      education 009 100

                                                      age -002 055 100

                                                      ACT -004 015 011 100

                                                      SATV -002 005 -004 056 100

                                                      SATQ -017 003 -003 059 064 100

                                                      When comparing results from two different groups it is convenient to display them as onematrix with the results from one group below the diagonal and the other group above thediagonal Use lowerUpper to do this

                                                      gt female lt- subset(satactsatact$gender==2)

                                                      gt male lt- subset(satactsatact$gender==1)

                                                      gt lower lt- lowerCor(male[-1])

                                                      edctn age ACT SATV SATQ

                                                      education 100

                                                      age 061 100

                                                      ACT 016 015 100

                                                      SATV 002 -006 061 100

                                                      SATQ 008 004 060 068 100

                                                      gt upper lt- lowerCor(female[-1])

                                                      edctn age ACT SATV SATQ

                                                      education 100

                                                      age 052 100

                                                      ACT 016 008 100

                                                      SATV 007 -003 053 100

                                                      SATQ 003 -009 058 063 100

                                                      gt both lt- lowerUpper(lowerupper)

                                                      gt round(both2)

                                                      education age ACT SATV SATQ

                                                      education NA 052 016 007 003

                                                      age 061 NA 008 -003 -009

                                                      ACT 016 015 NA 053 058

                                                      SATV 002 -006 061 NA 063

                                                      SATQ 008 004 060 068 NA

                                                      It is also possible to compare two matrices by taking their differences and displaying one (be-low the diagonal) and the difference of the second from the first above the diagonal

                                                      27

                                                      gt diffs lt- lowerUpper(lowerupperdiff=TRUE)

                                                      gt round(diffs2)

                                                      education age ACT SATV SATQ

                                                      education NA 009 000 -005 005

                                                      age 061 NA 007 -003 013

                                                      ACT 016 015 NA 008 002

                                                      SATV 002 -006 061 NA 005

                                                      SATQ 008 004 060 068 NA

                                                      348 Heatmap displays of correlational structure

                                                      Perhaps a better way to see the structure in a correlation matrix is to display a heat mapof the correlations This is just a matrix color coded to represent the magnitude of thecorrelation This is useful when considering the number of factors in a data set Considerthe Thurstone data set which has a clear 3 factor solution (Figure 11) or a simulated dataset of 24 variables with a circumplex structure (Figure 12) The color coding representsa ldquoheat maprdquo of the correlations with darker shades of red representing stronger negativeand darker shades of blue stronger positive correlations As an option the value of thecorrelation can be shown

                                                      Yet another way to show structure is to use ldquospiderrdquo plots Particularly if variables areordered in some meaningful way (eg in a circumplex) a spider plot will show this structureeasily This is just a plot of the magnitude of the correlation as a radial line with lengthranging from 0 (for a correlation of -1) to 1 (for a correlation of 1) (See Figure 13)

                                                      35 Testing correlations

                                                      Correlations are wonderful descriptive statistics of the data but some people like to testwhether these correlations differ from zero or differ from each other The cortest func-tion (in the stats package) will test the significance of a single correlation and the rcorr

                                                      function in the Hmisc package will do this for many correlations In the psych packagethe corrtest function reports the correlation (Pearson Spearman or Kendall) betweenall variables in either one or two data frames or matrices as well as the number of obser-vations for each case and the (two-tailed) probability for each correlation Unfortunatelythese probability values have not been corrected for multiple comparisons and so shouldbe taken with a great deal of salt Thus in corrtest and corrp the raw probabilitiesare reported below the diagonal and the probabilities adjusted for multiple comparisonsusing (by default) the Holm correction are reported above the diagonal (Table 1) (See thepadjust function for a discussion of Holm (1979) and other corrections)

                                                      Testing the difference between any two correlations can be done using the rtest functionThe function actually does four different tests (based upon an article by Steiger (1980)

                                                      28

                                                      gt png(corplotpng)gt corPlot(Thurstonenumbers=TRUEupper=FALSEdiag=FALSEmain=9 cognitive variables from Thurstone)

                                                      gt devoff()

                                                      null device

                                                      1

                                                      Figure 11 The structure of correlation matrix can be seen more clearly if the variables aregrouped by factor and then the correlations are shown by color By using the rsquonumbersrsquooption the values are displayed as well By default the complete matrix is shown Settingupper=FALSE and diag=FALSE shows a cleaner figure

                                                      29

                                                      gt png(circplotpng)gt circ lt- simcirc(24)

                                                      gt rcirc lt- cor(circ)

                                                      gt corPlot(rcircmain=24 variables in a circumplex)gt devoff()

                                                      null device

                                                      1

                                                      Figure 12 Using the corPlot function to show the correlations in a circumplex Correlationsare highest near the diagonal diminish to zero further from the diagonal and the increaseagain towards the corners of the matrix Circumplex structures are common in the studyof affect For circumplex structures it is perhaps useful to show the complete matrix

                                                      30

                                                      gt png(spiderpng)gt oplt- par(mfrow=c(22))

                                                      gt spider(y=c(161218)x=124data=rcircfill=TRUEmain=Spider plot of 24 circumplex variables)

                                                      gt op lt- par(mfrow=c(11))

                                                      gt devoff()

                                                      null device

                                                      1

                                                      Figure 13 A spider plot can show circumplex structure very clearly Circumplex structuresare common in the study of affect

                                                      31

                                                      Table 1 The corrtest function reports correlations cell sizes and raw and adjustedprobability values corrp reports the probability values for a correlation matrix Bydefault the adjustment used is that of Holm (1979)gt corrtest(satact)

                                                      Callcorrtest(x = satact)

                                                      Correlation matrix

                                                      gender education age ACT SATV SATQ

                                                      gender 100 009 -002 -004 -002 -017

                                                      education 009 100 055 015 005 003

                                                      age -002 055 100 011 -004 -003

                                                      ACT -004 015 011 100 056 059

                                                      SATV -002 005 -004 056 100 064

                                                      SATQ -017 003 -003 059 064 100

                                                      Sample Size

                                                      gender education age ACT SATV SATQ

                                                      gender 700 700 700 700 700 687

                                                      education 700 700 700 700 700 687

                                                      age 700 700 700 700 700 687

                                                      ACT 700 700 700 700 700 687

                                                      SATV 700 700 700 700 700 687

                                                      SATQ 687 687 687 687 687 687

                                                      Probability values (Entries above the diagonal are adjusted for multiple tests)

                                                      gender education age ACT SATV SATQ

                                                      gender 000 017 100 100 1 0

                                                      education 002 000 000 000 1 1

                                                      age 058 000 000 003 1 1

                                                      ACT 033 000 000 000 0 0

                                                      SATV 062 022 026 000 0 0

                                                      SATQ 000 036 037 000 0 0

                                                      To see confidence intervals of the correlations print with the short=FALSE option

                                                      32

                                                      depending upon the input

                                                      1) For a sample size n find the t and p value for a single correlation as well as the confidenceinterval

                                                      gt rtest(503)

                                                      Correlation tests

                                                      Callrtest(n = 50 r12 = 03)

                                                      Test of significance of a correlation

                                                      t value 218 with probability lt 0034

                                                      and confidence interval 002 053

                                                      2) For sample sizes of n and n2 (n2 = n if not specified) find the z of the difference betweenthe z transformed correlations divided by the standard error of the difference of two zscores

                                                      gt rtest(3046)

                                                      Correlation tests

                                                      Callrtest(n = 30 r12 = 04 r34 = 06)

                                                      Test of difference between two independent correlations

                                                      z value 099 with probability 032

                                                      3) For sample size n and correlations ra= r12 rb= r23 and r13 specified test for thedifference of two dependent correlations (Steiger case A)

                                                      gt rtest(103451)

                                                      Correlation tests

                                                      Call[1] rtest(n = 103 r12 = 04 r23 = 01 r13 = 05 )

                                                      Test of difference between two correlated correlations

                                                      t value -089 with probability lt 037

                                                      4) For sample size n test for the difference between two dependent correlations involvingdifferent variables (Steiger case B)

                                                      gt rtest(103567558) steiger Case B

                                                      Correlation tests

                                                      Callrtest(n = 103 r12 = 05 r34 = 06 r23 = 07 r13 = 05 r14 = 05

                                                      r24 = 08)

                                                      Test of difference between two dependent correlations

                                                      z value -12 with probability 023

                                                      To test whether a matrix of correlations differs from what would be expected if the popu-lation correlations were all zero the function cortest follows Steiger (1980) who pointedout that the sum of the squared elements of a correlation matrix or the Fisher z scoreequivalents is distributed as chi square under the null hypothesis that the values are zero(ie elements of the identity matrix) This is particularly useful for examining whethercorrelations in a single matrix differ from zero or for comparing two matrices Althoughobvious cortest can be used to test whether the satact data matrix produces non-zerocorrelations (it does) This is a much more appropriate test when testing whether a residualmatrix differs from zero

                                                      gt cortest(satact)

                                                      33

                                                      Tests of correlation matrices

                                                      Callcortest(R1 = satact)

                                                      Chi Square value 132542 with df = 15 with probability lt 18e-273

                                                      36 Polychoric tetrachoric polyserial and biserial correlations

                                                      The Pearson correlation of dichotomous data is also known as the φ coefficient If thedata eg ability items are thought to represent an underlying continuous although latentvariable the φ will underestimate the value of the Pearson applied to these latent variablesOne solution to this problem is to use the tetrachoric correlation which is based uponthe assumption of a bivariate normal distribution that has been cut at certain points Thedrawtetra function demonstrates the process (Figure 14) This is also shown in termsof dichotomizing the bivariate normal density function using the drawcor function (Fig-ure 15) A simple generalization of this to the case of the multiple cuts is the polychoric

                                                      correlation

                                                      Other estimated correlations based upon the assumption of bivariate normality with cutpoints include the biserial and polyserial correlation

                                                      If the data are a mix of continuous polytomous and dichotomous variables the mixedcor

                                                      function will calculate the appropriate mixture of Pearson polychoric tetrachoric biserialand polyserial correlations

                                                      The correlation matrix resulting from a number of tetrachoric or polychoric correlationmatrix sometimes will not be positive semi-definite This will sometimes happen if thecorrelation matrix is formed by using pair-wise deletion of cases The corsmooth functionwill adjust the smallest eigen values of the correlation matrix to make them positive rescaleall of them to sum to the number of variables and produce aldquosmoothedrdquocorrelation matrixAn example of this problem is a data set of burt which probably had a typo in the originalcorrelation matrix Smoothing the matrix corrects this problem

                                                      4 Multilevel modeling

                                                      Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular use

                                                      34

                                                      gt drawtetra()

                                                      minus3 minus2 minus1 0 1 2 3

                                                      minus3

                                                      minus2

                                                      minus1

                                                      01

                                                      23

                                                      Y rho = 05phi = 033

                                                      X gt τY gt Τ

                                                      X lt τY gt Τ

                                                      X gt τY lt Τ

                                                      X lt τY lt Τ

                                                      x

                                                      dnor

                                                      m(x

                                                      )

                                                      X gt τ

                                                      τ

                                                      x1

                                                      Y gt Τ

                                                      Τ

                                                      Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values

                                                      35

                                                      gt drawcor(expand=20cuts=c(00))

                                                      xy

                                                      z

                                                      Bivariate density rho = 05

                                                      Figure 15 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values It isfound (laboriously) by optimizing the fit of the bivariate normal for various values of thecorrelation to the observed cell frequencies

                                                      36

                                                      is the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups

                                                      41 Decomposing data into within and between level correlations usingstatsBy

                                                      There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler functionto give some of the basic descriptive statistics for two level models

                                                      This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package

                                                      rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)

                                                      where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means

                                                      42 Generating and displaying multilevel data

                                                      withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a betweengroup correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data

                                                      simmultilevel will generate simulated data with a multilevel structure

                                                      The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary

                                                      function specifying the variable of interest

                                                      37

                                                      Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )

                                                      43 Factor analysis by groups

                                                      Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output

                                                      sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)

                                                      faBy(sbnfactors=5) find the 5 factor solution for each education level

                                                      5 Multiple Regression mediation moderation and set cor-relations

                                                      The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices

                                                      51 Multiple regression from data or correlation matrices

                                                      The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstonecorrelation matrix and find the multiple correlation of the last five variables as a functionof the first 4

                                                      gt setCor(y = 59x=14data=Thurstone)

                                                      Call setCor(y = 59 x = 14 data = Thurstone)

                                                      Multiple Regression from matrix input

                                                      Beta weights

                                                      FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

                                                      Sentences 009 007 025 021 020

                                                      Vocabulary 009 017 009 016 -002

                                                      SentCompletion 002 005 004 021 008

                                                      FirstLetters 058 045 021 008 031

                                                      38

                                                      Multiple R

                                                      FourLetterWords Suffixes LetterSeries Pedigrees

                                                      069 063 050 058

                                                      LetterGroup

                                                      048

                                                      multiple R2

                                                      FourLetterWords Suffixes LetterSeries Pedigrees

                                                      048 040 025 034

                                                      LetterGroup

                                                      023

                                                      Multiple Inflation Factor (VIF) = 1(1-SMC) =

                                                      Sentences Vocabulary SentCompletion FirstLetters

                                                      369 388 300 135

                                                      Unweighted multiple R

                                                      FourLetterWords Suffixes LetterSeries Pedigrees

                                                      059 058 049 058

                                                      LetterGroup

                                                      045

                                                      Unweighted multiple R2

                                                      FourLetterWords Suffixes LetterSeries Pedigrees

                                                      034 034 024 033

                                                      LetterGroup

                                                      020

                                                      Various estimates of between set correlations

                                                      Squared Canonical Correlations

                                                      [1] 06280 01478 00076 00049

                                                      Average squared canonical correlation = 02

                                                      Cohens Set Correlation R2 = 069

                                                      Unweighted correlation between the two sets = 073

                                                      By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed

                                                      gt sc lt- setCor(y = 59x=34data=Thurstonez=12)

                                                      Call setCor(y = 59 x = 34 data = Thurstone z = 12)

                                                      Multiple Regression from matrix input

                                                      Beta weights

                                                      FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

                                                      SentCompletion 002 005 004 021 008

                                                      FirstLetters 058 045 021 008 031

                                                      Multiple R

                                                      FourLetterWords Suffixes LetterSeries Pedigrees

                                                      058 046 021 018

                                                      LetterGroup

                                                      030

                                                      39

                                                      multiple R2

                                                      FourLetterWords Suffixes LetterSeries Pedigrees

                                                      0331 0210 0043 0032

                                                      LetterGroup

                                                      0092

                                                      Multiple Inflation Factor (VIF) = 1(1-SMC) =

                                                      SentCompletion FirstLetters

                                                      102 102

                                                      Unweighted multiple R

                                                      FourLetterWords Suffixes LetterSeries Pedigrees

                                                      044 035 017 014

                                                      LetterGroup

                                                      026

                                                      Unweighted multiple R2

                                                      FourLetterWords Suffixes LetterSeries Pedigrees

                                                      019 012 003 002

                                                      LetterGroup

                                                      007

                                                      Various estimates of between set correlations

                                                      Squared Canonical Correlations

                                                      [1] 0405 0023

                                                      Average squared canonical correlation = 021

                                                      Cohens Set Correlation R2 = 042

                                                      Unweighted correlation between the two sets = 048

                                                      gt round(sc$residual2)

                                                      FourLetterWords Suffixes LetterSeries Pedigrees

                                                      FourLetterWords 052 011 009 006

                                                      Suffixes 011 060 -001 001

                                                      LetterSeries 009 -001 075 028

                                                      Pedigrees 006 001 028 066

                                                      LetterGroup 013 003 037 020

                                                      LetterGroup

                                                      FourLetterWords 013

                                                      Suffixes 003

                                                      LetterSeries 037

                                                      Pedigrees 020

                                                      LetterGroup 077

                                                      52 Mediation and Moderation analysis

                                                      Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping

                                                      40

                                                      Consider the example from Preacher and Hayes (2004) as analyzed using the mediate

                                                      function and the subsequent graphic from mediatediagram The data are found in theexample for mediate

                                                      Call mediate(y = SATIS x = THERAPY m = ATTRIB data = sobel)

                                                      The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB

                                                      Total Direct effect(c) of THERAPY on SATIS = 076 SE = 031 t direct = 25 with probability = 0019

                                                      Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t direct = 135 with probability = 019

                                                      Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033

                                                      Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 004 Upper CI = 069

                                                      R2 of model = 031

                                                      To see the longer output specify short = FALSE in the print statement

                                                      Full output

                                                      Total effect estimates (c)

                                                      SATIS se t Prob

                                                      THERAPY 076 031 25 00186

                                                      Direct effect estimates (c)SATIS se t Prob

                                                      THERAPY 043 032 135 0190

                                                      ATTRIB 040 018 223 0034

                                                      a effect estimates

                                                      THERAPY se t Prob

                                                      ATTRIB 082 03 274 00106

                                                      b effect estimates

                                                      SATIS se t Prob

                                                      ATTRIB 04 018 223 0034

                                                      ab effect estimates

                                                      SATIS boot sd lower upper

                                                      THERAPY 033 032 017 004 069

                                                      bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables

                                                      setCor(y = c( SATV SATQ) x = c(education age ) data = satact std=TRUE)

                                                      bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap

                                                      mediate(y = c( SATV ) x = c(education age ) m= ACT data =satactstd=TRUEniter=50)

                                                      bull mediate will take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It then tests the mediation effect using a bootstrap The particular example is for demonstration purposes only and shows neithermoderation nor mediation The number of iterations for the boot strap was set to 50

                                                      41

                                                      gt mediatediagram(preacher)

                                                      Mediation model

                                                      THERAPY SATIS

                                                      ATTRIB

                                                      082

                                                      c = 076

                                                      c = 043

                                                      04

                                                      Figure 16 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram

                                                      42

                                                      gt preacher lt- setCor(1c(23)sobelstd=FALSE)

                                                      gt setCordiagram(preacher)

                                                      Regression Models

                                                      THERAPY

                                                      ATTRIB

                                                      SATIS

                                                      043

                                                      04

                                                      021

                                                      Figure 17 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure

                                                      43

                                                      for speed The default number of boot straps is 5000

                                                      53 Set Correlation

                                                      An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor

                                                      function Set correlation is

                                                      R2 = 1minusn

                                                      prodi=1

                                                      (1minusλi)

                                                      where λi is the ith eigen value of the eigen value decomposition of the matrix

                                                      R = Rminus1xx RxyRminus1

                                                      xx Rminus1xy

                                                      Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate

                                                      setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data

                                                      Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things tonotice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied

                                                      For this example the analysis is done on the correlation matrix rather than the rawdata

                                                      gt C lt- cov(satactuse=pairwise)

                                                      gt model1 lt- lm(ACT~ gender + education + age data=satact)

                                                      gt summary(model1)

                                                      Call

                                                      lm(formula = ACT ~ gender + education + age data = satact)

                                                      Residuals

                                                      44

                                                      Call mediate(y = c(SATQ) x = c(ACT) m = education data = satact

                                                      mod = gender niter = 50 std = TRUE)

                                                      The DV (Y) was SATQ The IV (X) was ACT gender ACTXgndr The mediating variable(s) = education

                                                      Total Direct effect(c) of ACT on SATQ = 058 SE = 003 t direct = 1925 with probability = 0

                                                      Direct effect (c) of ACT on SATQ removing education = 059 SE = 003 t direct = 1926 with probability = 0

                                                      Indirect effect (ab) of ACT on SATQ through education = -001

                                                      Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -002 Upper CI = 0

                                                      Total Direct effect(c) of gender on SATQ = -014 SE = 003 t direct = -478 with probability = 21e-06

                                                      Direct effect (c) of gender on NA removing education = -014 SE = 003 t direct = -463 with probability = 44e-06

                                                      Indirect effect (ab) of gender on SATQ through education = 0

                                                      Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -001 Upper CI = 0

                                                      Total Direct effect(c) of ACTXgndr on SATQ = 0 SE = 003 t direct = 002 with probability = 099

                                                      Direct effect (c) of ACTXgndr on NA removing education = 0 SE = 003 t direct = 001 with probability = 099

                                                      Indirect effect (ab) of ACTXgndr on SATQ through education = 0

                                                      Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = 0 Upper CI = 0

                                                      R2 of model = 037

                                                      To see the longer output specify short = FALSE in the print statement

                                                      Full output

                                                      Total effect estimates (c)

                                                      SATQ se t Prob

                                                      ACT 058 003 1925 000e+00

                                                      gender -014 003 -478 210e-06

                                                      ACTXgndr 000 003 002 985e-01

                                                      Direct effect estimates (c)SATQ se t Prob

                                                      ACT 059 003 1926 000e+00

                                                      gender -014 003 -463 437e-06

                                                      ACTXgndr 000 003 001 992e-01

                                                      a effect estimates

                                                      education se t Prob

                                                      ACT 016 004 422 277e-05

                                                      gender 009 004 250 128e-02

                                                      ACTXgndr -001 004 -015 883e-01

                                                      b effect estimates

                                                      SATQ se t Prob

                                                      education -004 003 -145 0147

                                                      ab effect estimates

                                                      SATQ boot sd lower upper

                                                      ACT -001 -001 001 0 0

                                                      gender 000 000 000 0 0

                                                      ACTXgndr 000 000 000 0 0

                                                      Moderation model

                                                      ACT

                                                      gender

                                                      ACTXgndr

                                                      SATQ

                                                      education016 c = 058

                                                      c = 059

                                                      009 c = minus014

                                                      c = minus014

                                                      minus001 c = 0

                                                      c = 0

                                                      minus004

                                                      minus004

                                                      minus007

                                                      002

                                                      Figure 18 Moderated multiple regression requires the raw data

                                                      45

                                                      Min 1Q Median 3Q Max

                                                      -252458 -32133 07769 35921 92630

                                                      Coefficients

                                                      Estimate Std Error t value Pr(gt|t|)

                                                      (Intercept) 2741706 082140 33378 lt 2e-16

                                                      gender -048606 037984 -1280 020110

                                                      education 047890 015235 3143 000174

                                                      age 001623 002278 0712 047650

                                                      ---

                                                      Signif codes 0 0001 001 005 01 1

                                                      Residual standard error 4768 on 696 degrees of freedom

                                                      Multiple R-squared 00272 Adjusted R-squared 002301

                                                      F-statistic 6487 on 3 and 696 DF p-value 00002476

                                                      Compare this with the output from setCor

                                                      gt compare with sector

                                                      gt setCor(c(46)c(13)C nobs=700)

                                                      Call setCor(y = c(46) x = c(13) data = C nobs = 700)

                                                      Multiple Regression from matrix input

                                                      Beta weights

                                                      ACT SATV SATQ

                                                      gender -005 -003 -018

                                                      education 014 010 010

                                                      age 003 -010 -009

                                                      Multiple R

                                                      ACT SATV SATQ

                                                      016 010 019

                                                      multiple R2

                                                      ACT SATV SATQ

                                                      00272 00096 00359

                                                      Multiple Inflation Factor (VIF) = 1(1-SMC) =

                                                      gender education age

                                                      101 145 144

                                                      Unweighted multiple R

                                                      ACT SATV SATQ

                                                      015 005 011

                                                      Unweighted multiple R2

                                                      ACT SATV SATQ

                                                      002 000 001

                                                      SE of Beta weights

                                                      ACT SATV SATQ

                                                      gender 018 429 434

                                                      education 022 513 518

                                                      age 022 511 516

                                                      t of Beta Weights

                                                      ACT SATV SATQ

                                                      gender -027 -001 -004

                                                      education 065 002 002

                                                      46

                                                      age 015 -002 -002

                                                      Probability of t lt

                                                      ACT SATV SATQ

                                                      gender 079 099 097

                                                      education 051 098 098

                                                      age 088 098 099

                                                      Shrunken R2

                                                      ACT SATV SATQ

                                                      00230 00054 00317

                                                      Standard Error of R2

                                                      ACT SATV SATQ

                                                      00120 00073 00137

                                                      F

                                                      ACT SATV SATQ

                                                      649 226 863

                                                      Probability of F lt

                                                      ACT SATV SATQ

                                                      248e-04 808e-02 124e-05

                                                      degrees of freedom of regression

                                                      [1] 3 696

                                                      Various estimates of between set correlations

                                                      Squared Canonical Correlations

                                                      [1] 0050 0033 0008

                                                      Chisq of canonical correlations

                                                      [1] 358 231 56

                                                      Average squared canonical correlation = 003

                                                      Cohens Set Correlation R2 = 009

                                                      Shrunken Set Correlation R2 = 008

                                                      F and df of Cohens Set Correlation 726 9 168186

                                                      Unweighted correlation between the two sets = 001

                                                      Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

                                                      6 Converting output to APA style tables using LATEX

                                                      Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenient

                                                      47

                                                      LATEXoutput and finally df2latex converts a generic data frame to LATEX

                                                      An example of converting the output from fa to LATEXappears in Table 2

                                                      Table 2 fa2latexA factor analysis table from the psych package in R

                                                      Variable MR1 MR2 MR3 h2 u2 com

                                                      Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

                                                      SS loadings 264 186 15

                                                      MR1 100 059 054MR2 059 100 052MR3 054 052 100

                                                      48

                                                      7 Miscellaneous functions

                                                      A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

                                                      blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

                                                      df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

                                                      scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

                                                      cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

                                                      cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

                                                      dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

                                                      fisherz Convert a correlation to the corresponding Fisher z score

                                                      geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

                                                      ICC and cohenkappa are typically used to find the reliability for raters

                                                      headtail combines the head and tail functions to show the first and last lines of a dataset or output

                                                      topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

                                                      mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

                                                      prep finds the probability of replication for an F t or r and estimate effect size

                                                      partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

                                                      rangeCorrection will correct correlations for restriction of range

                                                      reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

                                                      49

                                                      superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

                                                      8 Data sets

                                                      A number of data sets for demonstrating psychometric techniques are included in thepsych package These include six data sets showing a hierarchical factor structure (fivecognitive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2and one from health psychology Reise) One of these (Thurstone) is used as an examplein the sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) or 13 personality inventory scores (epibfi) and14 multiple choice iq items (iqitems) The vegetables example has paired comparisonpreferences for 9 vegetables This is an example of Thurstonian scaling used by Guilford(1954) and Nunnally (1967) Other data sets include cubits peas and heights fromGalton

                                                      Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

                                                      bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

                                                      satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

                                                      epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

                                                      50

                                                      iq 14 multiple choice ability items were included as part of the Synthetic Aperture Person-ality Assessment (SAPA) web based personality assessment project The data from1000 subjects are included here as a demonstration set for scoring multiple choiceinventories and doing basic item statistics

                                                      galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to ntroduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

                                                      Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

                                                      miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

                                                      9 Development version and a users guide

                                                      The most recent development version is available as a source file at the repository main-tained at httppersonality-projectorgr That version will have removed the mostrecently discovered bugs (but perhaps introduced other yet to be discovered ones) Todownload that version go to the repository httppersonality-projectorgrsrc

                                                      contrib and wander around For a Mac this version can be installed directly using theldquoother repositoryrdquo option in the package installer For a PC the zip file for the most recentrelease has been created using the win-builder facility at CRAN The development releasefor the Mac is usually several weeks ahead of the PC development version

                                                      Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httppersonality-projectorgrpsych_manualpdf

                                                      News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

                                                      gt news(Version gt 170package=psych)

                                                      51

                                                      10 Psychometric Theory

                                                      The psych package has been developed to help psychologists do basic research Many ofthe functions were developed to supplement a book (httppersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep)More information about the use of some of the functions may be found in the book

                                                      For more extensive discussion of the use of psych in particular and R in general consulthttppersonality-projectorgrrguidehtml A short guide to R

                                                      11 SessionInfo

                                                      This document was prepared using the following settings

                                                      gt sessionInfo()

                                                      R Under development (unstable) (2017-03-05 r72309)

                                                      Platform x86_64-apple-darwin1340 (64-bit)

                                                      Running under macOS Sierra 10124

                                                      Matrix products default

                                                      BLAS LibraryFrameworksRframeworkVersions34ResourcesliblibRblas0dylib

                                                      LAPACK LibraryFrameworksRframeworkVersions34ResourcesliblibRlapackdylib

                                                      locale

                                                      [1] C

                                                      attached base packages

                                                      [1] stats graphics grDevices utils datasets methods base

                                                      other attached packages

                                                      [1] psych_17421

                                                      loaded via a namespace (and not attached)

                                                      [1] compiler_340 parallel_340 tools_340 foreign_08-67

                                                      [5] KernSmooth_223-15 nlme_31-131 mnormt_15-4 grid_340

                                                      [9] lattice_020-34

                                                      52

                                                      References

                                                      Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

                                                      Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

                                                      Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

                                                      Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

                                                      Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

                                                      Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

                                                      Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

                                                      Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

                                                      Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

                                                      Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

                                                      Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

                                                      Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

                                                      Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

                                                      Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

                                                      Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

                                                      53

                                                      Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

                                                      Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

                                                      Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

                                                      Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

                                                      Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

                                                      Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

                                                      Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

                                                      Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

                                                      Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

                                                      Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

                                                      MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

                                                      Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

                                                      McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

                                                      Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

                                                      Nunnally J C (1967) Psychometric theory McGraw-Hill New York

                                                      54

                                                      Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

                                                      3rd edition

                                                      Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

                                                      Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

                                                      Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

                                                      Revelle W (2015) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 158

                                                      Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

                                                      Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

                                                      Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

                                                      Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

                                                      Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook ofIndividual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

                                                      Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

                                                      Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

                                                      Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

                                                      Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bang

                                                      55

                                                      for the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

                                                      Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

                                                      Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

                                                      Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

                                                      Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

                                                      Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

                                                      Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

                                                      Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

                                                      Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

                                                      Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

                                                      Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

                                                      Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

                                                      56

                                                      Index

                                                      affect 14 24alpha 5 6

                                                      Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26bifactor 6biserial 13 34blockrandom 49burt 34

                                                      char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49circular statistics 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49ctv 7cubits 50

                                                      densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49diagram 8drawcor 34drawtetra 34dummycode 13

                                                      dynamite plot 19

                                                      edit 3epibfi 50error bars 19errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23

                                                      fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factor analysis 6factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49

                                                      galton 51generalized least squares 6geometricmean 49GPArotation 7guttman 6

                                                      harmonicmean 49head 49headtail 49heights 50hetdiagram 7Hmisc 28Holzinger 50

                                                      57

                                                      ICC 6 49iclust 6iclustdiagram 7Index 49introduction to psychometric theory with ap-

                                                      plications in R 7iqitems 50irtfa 6 47irt2latex 47 49

                                                      KnitR 47

                                                      lavaan 38library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27lowess 14

                                                      makekeys 14MAP 6mardia 49maximum likelihood 6mediate 4 41 42mediatediagram 41minimum residual 6mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevel 37multilevelreliability 6multiple regression 38

                                                      nfactors 6nlme 37

                                                      omega 6 7outlier 3 11 12

                                                      padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7principal axis 6psych 3 5ndash8 28 47 49ndash52

                                                      R functionaffect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50

                                                      58

                                                      densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13edit 3epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49head 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47

                                                      irt2latex 47 49library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7psych 51psych package

                                                      affect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26

                                                      59

                                                      biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49corsmooth 34cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7

                                                      fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47irt2latex 47 49lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12prep 49pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34

                                                      60

                                                      polyserial 34principal 5ndash7psych 51rtest 28rangeCorrection 49readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

                                                      rtest 28

                                                      rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50table 19tail 49tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

                                                      R package

                                                      61

                                                      ctv 7GPArotation 7Hmisc 28KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash8 28 47 49ndash52Rgraphviz 8sem 7 50stats 28Sweave 47xtable 47

                                                      rtest 28rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49Rgraphviz 8

                                                      SAPA 26 50 51satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43sem 7 50set correlation 44setcor 49setCor 4 38 41 44 46 47simmultilevel 37

                                                      spider 13stars 14stats 28StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50Sweave 47

                                                      table 19tail 49tetrachoric 6 34Thurstone 28 38 50Thurstone33 50topBottom 49

                                                      vegetables 50 51violinBy 14 18vss 5 6

                                                      weighted least squares 6withinBetween 37

                                                      xtable 47

                                                      62

                                                      • Jump starting the psych packagendasha guide for the impatient
                                                      • Psychometric functions are summarized in the second vignette
                                                      • Overview of this and related documents
                                                      • Getting started
                                                      • Basic data analysis
                                                        • Getting the data by using readfile
                                                        • Data input from the clipboard
                                                        • Basic descriptive statistics
                                                          • Outlier detection using outlier
                                                          • Basic data cleaning using scrub
                                                          • Recoding categorical variables into dummy coded variables
                                                            • Simple descriptive graphics
                                                              • Scatter Plot Matrices
                                                              • Density or violin plots
                                                              • Means and error bars
                                                              • Error bars for tabular data
                                                              • Two dimensional displays of means and errors
                                                              • Back to back histograms
                                                              • Correlational structure
                                                              • Heatmap displays of correlational structure
                                                                • Testing correlations
                                                                • Polychoric tetrachoric polyserial and biserial correlations
                                                                  • Multilevel modeling
                                                                    • Decomposing data into within and between level correlations using statsBy
                                                                    • Generating and displaying multilevel data
                                                                    • Factor analysis by groups
                                                                      • Multiple Regression mediation moderation and set correlations
                                                                        • Multiple regression from data or correlation matrices
                                                                        • Mediation and Moderation analysis
                                                                        • Set Correlation
                                                                          • Converting output to APA style tables using LaTeX
                                                                          • Miscellaneous functions
                                                                          • Data sets
                                                                          • Development version and a users guide
                                                                          • Psychometric Theory
                                                                          • SessionInfo

                                                        gt diffs lt- lowerUpper(lowerupperdiff=TRUE)

                                                        gt round(diffs2)

                                                        education age ACT SATV SATQ

                                                        education NA 009 000 -005 005

                                                        age 061 NA 007 -003 013

                                                        ACT 016 015 NA 008 002

                                                        SATV 002 -006 061 NA 005

                                                        SATQ 008 004 060 068 NA

                                                        348 Heatmap displays of correlational structure

                                                        Perhaps a better way to see the structure in a correlation matrix is to display a heat mapof the correlations This is just a matrix color coded to represent the magnitude of thecorrelation This is useful when considering the number of factors in a data set Considerthe Thurstone data set which has a clear 3 factor solution (Figure 11) or a simulated dataset of 24 variables with a circumplex structure (Figure 12) The color coding representsa ldquoheat maprdquo of the correlations with darker shades of red representing stronger negativeand darker shades of blue stronger positive correlations As an option the value of thecorrelation can be shown

                                                        Yet another way to show structure is to use ldquospiderrdquo plots Particularly if variables areordered in some meaningful way (eg in a circumplex) a spider plot will show this structureeasily This is just a plot of the magnitude of the correlation as a radial line with lengthranging from 0 (for a correlation of -1) to 1 (for a correlation of 1) (See Figure 13)

                                                        35 Testing correlations

                                                        Correlations are wonderful descriptive statistics of the data but some people like to testwhether these correlations differ from zero or differ from each other The cortest func-tion (in the stats package) will test the significance of a single correlation and the rcorr

                                                        function in the Hmisc package will do this for many correlations In the psych packagethe corrtest function reports the correlation (Pearson Spearman or Kendall) betweenall variables in either one or two data frames or matrices as well as the number of obser-vations for each case and the (two-tailed) probability for each correlation Unfortunatelythese probability values have not been corrected for multiple comparisons and so shouldbe taken with a great deal of salt Thus in corrtest and corrp the raw probabilitiesare reported below the diagonal and the probabilities adjusted for multiple comparisonsusing (by default) the Holm correction are reported above the diagonal (Table 1) (See thepadjust function for a discussion of Holm (1979) and other corrections)

                                                        Testing the difference between any two correlations can be done using the rtest functionThe function actually does four different tests (based upon an article by Steiger (1980)

                                                        28

                                                        gt png(corplotpng)gt corPlot(Thurstonenumbers=TRUEupper=FALSEdiag=FALSEmain=9 cognitive variables from Thurstone)

                                                        gt devoff()

                                                        null device

                                                        1

                                                        Figure 11 The structure of correlation matrix can be seen more clearly if the variables aregrouped by factor and then the correlations are shown by color By using the rsquonumbersrsquooption the values are displayed as well By default the complete matrix is shown Settingupper=FALSE and diag=FALSE shows a cleaner figure

                                                        29

                                                        gt png(circplotpng)gt circ lt- simcirc(24)

                                                        gt rcirc lt- cor(circ)

                                                        gt corPlot(rcircmain=24 variables in a circumplex)gt devoff()

                                                        null device

                                                        1

                                                        Figure 12 Using the corPlot function to show the correlations in a circumplex Correlationsare highest near the diagonal diminish to zero further from the diagonal and the increaseagain towards the corners of the matrix Circumplex structures are common in the studyof affect For circumplex structures it is perhaps useful to show the complete matrix

                                                        30

                                                        gt png(spiderpng)gt oplt- par(mfrow=c(22))

                                                        gt spider(y=c(161218)x=124data=rcircfill=TRUEmain=Spider plot of 24 circumplex variables)

                                                        gt op lt- par(mfrow=c(11))

                                                        gt devoff()

                                                        null device

                                                        1

                                                        Figure 13 A spider plot can show circumplex structure very clearly Circumplex structuresare common in the study of affect

                                                        31

                                                        Table 1 The corrtest function reports correlations cell sizes and raw and adjustedprobability values corrp reports the probability values for a correlation matrix Bydefault the adjustment used is that of Holm (1979)gt corrtest(satact)

                                                        Callcorrtest(x = satact)

                                                        Correlation matrix

                                                        gender education age ACT SATV SATQ

                                                        gender 100 009 -002 -004 -002 -017

                                                        education 009 100 055 015 005 003

                                                        age -002 055 100 011 -004 -003

                                                        ACT -004 015 011 100 056 059

                                                        SATV -002 005 -004 056 100 064

                                                        SATQ -017 003 -003 059 064 100

                                                        Sample Size

                                                        gender education age ACT SATV SATQ

                                                        gender 700 700 700 700 700 687

                                                        education 700 700 700 700 700 687

                                                        age 700 700 700 700 700 687

                                                        ACT 700 700 700 700 700 687

                                                        SATV 700 700 700 700 700 687

                                                        SATQ 687 687 687 687 687 687

                                                        Probability values (Entries above the diagonal are adjusted for multiple tests)

                                                        gender education age ACT SATV SATQ

                                                        gender 000 017 100 100 1 0

                                                        education 002 000 000 000 1 1

                                                        age 058 000 000 003 1 1

                                                        ACT 033 000 000 000 0 0

                                                        SATV 062 022 026 000 0 0

                                                        SATQ 000 036 037 000 0 0

                                                        To see confidence intervals of the correlations print with the short=FALSE option

                                                        32

                                                        depending upon the input

                                                        1) For a sample size n find the t and p value for a single correlation as well as the confidenceinterval

                                                        gt rtest(503)

                                                        Correlation tests

                                                        Callrtest(n = 50 r12 = 03)

                                                        Test of significance of a correlation

                                                        t value 218 with probability lt 0034

                                                        and confidence interval 002 053

                                                        2) For sample sizes of n and n2 (n2 = n if not specified) find the z of the difference betweenthe z transformed correlations divided by the standard error of the difference of two zscores

                                                        gt rtest(3046)

                                                        Correlation tests

                                                        Callrtest(n = 30 r12 = 04 r34 = 06)

                                                        Test of difference between two independent correlations

                                                        z value 099 with probability 032

                                                        3) For sample size n and correlations ra= r12 rb= r23 and r13 specified test for thedifference of two dependent correlations (Steiger case A)

                                                        gt rtest(103451)

                                                        Correlation tests

                                                        Call[1] rtest(n = 103 r12 = 04 r23 = 01 r13 = 05 )

                                                        Test of difference between two correlated correlations

                                                        t value -089 with probability lt 037

                                                        4) For sample size n test for the difference between two dependent correlations involvingdifferent variables (Steiger case B)

                                                        gt rtest(103567558) steiger Case B

                                                        Correlation tests

                                                        Callrtest(n = 103 r12 = 05 r34 = 06 r23 = 07 r13 = 05 r14 = 05

                                                        r24 = 08)

                                                        Test of difference between two dependent correlations

                                                        z value -12 with probability 023

                                                        To test whether a matrix of correlations differs from what would be expected if the popu-lation correlations were all zero the function cortest follows Steiger (1980) who pointedout that the sum of the squared elements of a correlation matrix or the Fisher z scoreequivalents is distributed as chi square under the null hypothesis that the values are zero(ie elements of the identity matrix) This is particularly useful for examining whethercorrelations in a single matrix differ from zero or for comparing two matrices Althoughobvious cortest can be used to test whether the satact data matrix produces non-zerocorrelations (it does) This is a much more appropriate test when testing whether a residualmatrix differs from zero

                                                        gt cortest(satact)

                                                        33

                                                        Tests of correlation matrices

                                                        Callcortest(R1 = satact)

                                                        Chi Square value 132542 with df = 15 with probability lt 18e-273

                                                        36 Polychoric tetrachoric polyserial and biserial correlations

                                                        The Pearson correlation of dichotomous data is also known as the φ coefficient If thedata eg ability items are thought to represent an underlying continuous although latentvariable the φ will underestimate the value of the Pearson applied to these latent variablesOne solution to this problem is to use the tetrachoric correlation which is based uponthe assumption of a bivariate normal distribution that has been cut at certain points Thedrawtetra function demonstrates the process (Figure 14) This is also shown in termsof dichotomizing the bivariate normal density function using the drawcor function (Fig-ure 15) A simple generalization of this to the case of the multiple cuts is the polychoric

                                                        correlation

                                                        Other estimated correlations based upon the assumption of bivariate normality with cutpoints include the biserial and polyserial correlation

                                                        If the data are a mix of continuous polytomous and dichotomous variables the mixedcor

                                                        function will calculate the appropriate mixture of Pearson polychoric tetrachoric biserialand polyserial correlations

                                                        The correlation matrix resulting from a number of tetrachoric or polychoric correlationmatrix sometimes will not be positive semi-definite This will sometimes happen if thecorrelation matrix is formed by using pair-wise deletion of cases The corsmooth functionwill adjust the smallest eigen values of the correlation matrix to make them positive rescaleall of them to sum to the number of variables and produce aldquosmoothedrdquocorrelation matrixAn example of this problem is a data set of burt which probably had a typo in the originalcorrelation matrix Smoothing the matrix corrects this problem

                                                        4 Multilevel modeling

                                                        Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular use

                                                        34

                                                        gt drawtetra()

                                                        minus3 minus2 minus1 0 1 2 3

                                                        minus3

                                                        minus2

                                                        minus1

                                                        01

                                                        23

                                                        Y rho = 05phi = 033

                                                        X gt τY gt Τ

                                                        X lt τY gt Τ

                                                        X gt τY lt Τ

                                                        X lt τY lt Τ

                                                        x

                                                        dnor

                                                        m(x

                                                        )

                                                        X gt τ

                                                        τ

                                                        x1

                                                        Y gt Τ

                                                        Τ

                                                        Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values

                                                        35

                                                        gt drawcor(expand=20cuts=c(00))

                                                        xy

                                                        z

                                                        Bivariate density rho = 05

                                                        Figure 15 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values It isfound (laboriously) by optimizing the fit of the bivariate normal for various values of thecorrelation to the observed cell frequencies

                                                        36

                                                        is the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups

                                                        41 Decomposing data into within and between level correlations usingstatsBy

                                                        There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler functionto give some of the basic descriptive statistics for two level models

                                                        This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package

                                                        rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)

                                                        where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means

                                                        42 Generating and displaying multilevel data

                                                        withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a betweengroup correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data

                                                        simmultilevel will generate simulated data with a multilevel structure

                                                        The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary

                                                        function specifying the variable of interest

                                                        37

                                                        Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )

                                                        43 Factor analysis by groups

                                                        Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output

                                                        sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)

                                                        faBy(sbnfactors=5) find the 5 factor solution for each education level

                                                        5 Multiple Regression mediation moderation and set cor-relations

                                                        The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices

                                                        51 Multiple regression from data or correlation matrices

                                                        The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstonecorrelation matrix and find the multiple correlation of the last five variables as a functionof the first 4

                                                        gt setCor(y = 59x=14data=Thurstone)

                                                        Call setCor(y = 59 x = 14 data = Thurstone)

                                                        Multiple Regression from matrix input

                                                        Beta weights

                                                        FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

                                                        Sentences 009 007 025 021 020

                                                        Vocabulary 009 017 009 016 -002

                                                        SentCompletion 002 005 004 021 008

                                                        FirstLetters 058 045 021 008 031

                                                        38

                                                        Multiple R

                                                        FourLetterWords Suffixes LetterSeries Pedigrees

                                                        069 063 050 058

                                                        LetterGroup

                                                        048

                                                        multiple R2

                                                        FourLetterWords Suffixes LetterSeries Pedigrees

                                                        048 040 025 034

                                                        LetterGroup

                                                        023

                                                        Multiple Inflation Factor (VIF) = 1(1-SMC) =

                                                        Sentences Vocabulary SentCompletion FirstLetters

                                                        369 388 300 135

                                                        Unweighted multiple R

                                                        FourLetterWords Suffixes LetterSeries Pedigrees

                                                        059 058 049 058

                                                        LetterGroup

                                                        045

                                                        Unweighted multiple R2

                                                        FourLetterWords Suffixes LetterSeries Pedigrees

                                                        034 034 024 033

                                                        LetterGroup

                                                        020

                                                        Various estimates of between set correlations

                                                        Squared Canonical Correlations

                                                        [1] 06280 01478 00076 00049

                                                        Average squared canonical correlation = 02

                                                        Cohens Set Correlation R2 = 069

                                                        Unweighted correlation between the two sets = 073

                                                        By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed

                                                        gt sc lt- setCor(y = 59x=34data=Thurstonez=12)

                                                        Call setCor(y = 59 x = 34 data = Thurstone z = 12)

                                                        Multiple Regression from matrix input

                                                        Beta weights

                                                        FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

                                                        SentCompletion 002 005 004 021 008

                                                        FirstLetters 058 045 021 008 031

                                                        Multiple R

                                                        FourLetterWords Suffixes LetterSeries Pedigrees

                                                        058 046 021 018

                                                        LetterGroup

                                                        030

                                                        39

                                                        multiple R2

                                                        FourLetterWords Suffixes LetterSeries Pedigrees

                                                        0331 0210 0043 0032

                                                        LetterGroup

                                                        0092

                                                        Multiple Inflation Factor (VIF) = 1(1-SMC) =

                                                        SentCompletion FirstLetters

                                                        102 102

                                                        Unweighted multiple R

                                                        FourLetterWords Suffixes LetterSeries Pedigrees

                                                        044 035 017 014

                                                        LetterGroup

                                                        026

                                                        Unweighted multiple R2

                                                        FourLetterWords Suffixes LetterSeries Pedigrees

                                                        019 012 003 002

                                                        LetterGroup

                                                        007

                                                        Various estimates of between set correlations

                                                        Squared Canonical Correlations

                                                        [1] 0405 0023

                                                        Average squared canonical correlation = 021

                                                        Cohens Set Correlation R2 = 042

                                                        Unweighted correlation between the two sets = 048

                                                        gt round(sc$residual2)

                                                        FourLetterWords Suffixes LetterSeries Pedigrees

                                                        FourLetterWords 052 011 009 006

                                                        Suffixes 011 060 -001 001

                                                        LetterSeries 009 -001 075 028

                                                        Pedigrees 006 001 028 066

                                                        LetterGroup 013 003 037 020

                                                        LetterGroup

                                                        FourLetterWords 013

                                                        Suffixes 003

                                                        LetterSeries 037

                                                        Pedigrees 020

                                                        LetterGroup 077

                                                        52 Mediation and Moderation analysis

                                                        Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping

                                                        40

                                                        Consider the example from Preacher and Hayes (2004) as analyzed using the mediate

                                                        function and the subsequent graphic from mediatediagram The data are found in theexample for mediate

                                                        Call mediate(y = SATIS x = THERAPY m = ATTRIB data = sobel)

                                                        The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB

                                                        Total Direct effect(c) of THERAPY on SATIS = 076 SE = 031 t direct = 25 with probability = 0019

                                                        Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t direct = 135 with probability = 019

                                                        Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033

                                                        Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 004 Upper CI = 069

                                                        R2 of model = 031

                                                        To see the longer output specify short = FALSE in the print statement

                                                        Full output

                                                        Total effect estimates (c)

                                                        SATIS se t Prob

                                                        THERAPY 076 031 25 00186

                                                        Direct effect estimates (c)SATIS se t Prob

                                                        THERAPY 043 032 135 0190

                                                        ATTRIB 040 018 223 0034

                                                        a effect estimates

                                                        THERAPY se t Prob

                                                        ATTRIB 082 03 274 00106

                                                        b effect estimates

                                                        SATIS se t Prob

                                                        ATTRIB 04 018 223 0034

                                                        ab effect estimates

                                                        SATIS boot sd lower upper

                                                        THERAPY 033 032 017 004 069

                                                        bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables

                                                        setCor(y = c( SATV SATQ) x = c(education age ) data = satact std=TRUE)

                                                        bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap

                                                        mediate(y = c( SATV ) x = c(education age ) m= ACT data =satactstd=TRUEniter=50)

                                                        bull mediate will take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It then tests the mediation effect using a bootstrap The particular example is for demonstration purposes only and shows neithermoderation nor mediation The number of iterations for the boot strap was set to 50

                                                        41

                                                        gt mediatediagram(preacher)

                                                        Mediation model

                                                        THERAPY SATIS

                                                        ATTRIB

                                                        082

                                                        c = 076

                                                        c = 043

                                                        04

                                                        Figure 16 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram

                                                        42

                                                        gt preacher lt- setCor(1c(23)sobelstd=FALSE)

                                                        gt setCordiagram(preacher)

                                                        Regression Models

                                                        THERAPY

                                                        ATTRIB

                                                        SATIS

                                                        043

                                                        04

                                                        021

                                                        Figure 17 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure

                                                        43

                                                        for speed The default number of boot straps is 5000

                                                        53 Set Correlation

                                                        An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor

                                                        function Set correlation is

                                                        R2 = 1minusn

                                                        prodi=1

                                                        (1minusλi)

                                                        where λi is the ith eigen value of the eigen value decomposition of the matrix

                                                        R = Rminus1xx RxyRminus1

                                                        xx Rminus1xy

                                                        Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate

                                                        setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data

                                                        Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things tonotice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied

                                                        For this example the analysis is done on the correlation matrix rather than the rawdata

                                                        gt C lt- cov(satactuse=pairwise)

                                                        gt model1 lt- lm(ACT~ gender + education + age data=satact)

                                                        gt summary(model1)

                                                        Call

                                                        lm(formula = ACT ~ gender + education + age data = satact)

                                                        Residuals

                                                        44

                                                        Call mediate(y = c(SATQ) x = c(ACT) m = education data = satact

                                                        mod = gender niter = 50 std = TRUE)

                                                        The DV (Y) was SATQ The IV (X) was ACT gender ACTXgndr The mediating variable(s) = education

                                                        Total Direct effect(c) of ACT on SATQ = 058 SE = 003 t direct = 1925 with probability = 0

                                                        Direct effect (c) of ACT on SATQ removing education = 059 SE = 003 t direct = 1926 with probability = 0

                                                        Indirect effect (ab) of ACT on SATQ through education = -001

                                                        Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -002 Upper CI = 0

                                                        Total Direct effect(c) of gender on SATQ = -014 SE = 003 t direct = -478 with probability = 21e-06

                                                        Direct effect (c) of gender on NA removing education = -014 SE = 003 t direct = -463 with probability = 44e-06

                                                        Indirect effect (ab) of gender on SATQ through education = 0

                                                        Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -001 Upper CI = 0

                                                        Total Direct effect(c) of ACTXgndr on SATQ = 0 SE = 003 t direct = 002 with probability = 099

                                                        Direct effect (c) of ACTXgndr on NA removing education = 0 SE = 003 t direct = 001 with probability = 099

                                                        Indirect effect (ab) of ACTXgndr on SATQ through education = 0

                                                        Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = 0 Upper CI = 0

                                                        R2 of model = 037

                                                        To see the longer output specify short = FALSE in the print statement

                                                        Full output

                                                        Total effect estimates (c)

                                                        SATQ se t Prob

                                                        ACT 058 003 1925 000e+00

                                                        gender -014 003 -478 210e-06

                                                        ACTXgndr 000 003 002 985e-01

                                                        Direct effect estimates (c)SATQ se t Prob

                                                        ACT 059 003 1926 000e+00

                                                        gender -014 003 -463 437e-06

                                                        ACTXgndr 000 003 001 992e-01

                                                        a effect estimates

                                                        education se t Prob

                                                        ACT 016 004 422 277e-05

                                                        gender 009 004 250 128e-02

                                                        ACTXgndr -001 004 -015 883e-01

                                                        b effect estimates

                                                        SATQ se t Prob

                                                        education -004 003 -145 0147

                                                        ab effect estimates

                                                        SATQ boot sd lower upper

                                                        ACT -001 -001 001 0 0

                                                        gender 000 000 000 0 0

                                                        ACTXgndr 000 000 000 0 0

                                                        Moderation model

                                                        ACT

                                                        gender

                                                        ACTXgndr

                                                        SATQ

                                                        education016 c = 058

                                                        c = 059

                                                        009 c = minus014

                                                        c = minus014

                                                        minus001 c = 0

                                                        c = 0

                                                        minus004

                                                        minus004

                                                        minus007

                                                        002

                                                        Figure 18 Moderated multiple regression requires the raw data

                                                        45

                                                        Min 1Q Median 3Q Max

                                                        -252458 -32133 07769 35921 92630

                                                        Coefficients

                                                        Estimate Std Error t value Pr(gt|t|)

                                                        (Intercept) 2741706 082140 33378 lt 2e-16

                                                        gender -048606 037984 -1280 020110

                                                        education 047890 015235 3143 000174

                                                        age 001623 002278 0712 047650

                                                        ---

                                                        Signif codes 0 0001 001 005 01 1

                                                        Residual standard error 4768 on 696 degrees of freedom

                                                        Multiple R-squared 00272 Adjusted R-squared 002301

                                                        F-statistic 6487 on 3 and 696 DF p-value 00002476

                                                        Compare this with the output from setCor

                                                        gt compare with sector

                                                        gt setCor(c(46)c(13)C nobs=700)

                                                        Call setCor(y = c(46) x = c(13) data = C nobs = 700)

                                                        Multiple Regression from matrix input

                                                        Beta weights

                                                        ACT SATV SATQ

                                                        gender -005 -003 -018

                                                        education 014 010 010

                                                        age 003 -010 -009

                                                        Multiple R

                                                        ACT SATV SATQ

                                                        016 010 019

                                                        multiple R2

                                                        ACT SATV SATQ

                                                        00272 00096 00359

                                                        Multiple Inflation Factor (VIF) = 1(1-SMC) =

                                                        gender education age

                                                        101 145 144

                                                        Unweighted multiple R

                                                        ACT SATV SATQ

                                                        015 005 011

                                                        Unweighted multiple R2

                                                        ACT SATV SATQ

                                                        002 000 001

                                                        SE of Beta weights

                                                        ACT SATV SATQ

                                                        gender 018 429 434

                                                        education 022 513 518

                                                        age 022 511 516

                                                        t of Beta Weights

                                                        ACT SATV SATQ

                                                        gender -027 -001 -004

                                                        education 065 002 002

                                                        46

                                                        age 015 -002 -002

                                                        Probability of t lt

                                                        ACT SATV SATQ

                                                        gender 079 099 097

                                                        education 051 098 098

                                                        age 088 098 099

                                                        Shrunken R2

                                                        ACT SATV SATQ

                                                        00230 00054 00317

                                                        Standard Error of R2

                                                        ACT SATV SATQ

                                                        00120 00073 00137

                                                        F

                                                        ACT SATV SATQ

                                                        649 226 863

                                                        Probability of F lt

                                                        ACT SATV SATQ

                                                        248e-04 808e-02 124e-05

                                                        degrees of freedom of regression

                                                        [1] 3 696

                                                        Various estimates of between set correlations

                                                        Squared Canonical Correlations

                                                        [1] 0050 0033 0008

                                                        Chisq of canonical correlations

                                                        [1] 358 231 56

                                                        Average squared canonical correlation = 003

                                                        Cohens Set Correlation R2 = 009

                                                        Shrunken Set Correlation R2 = 008

                                                        F and df of Cohens Set Correlation 726 9 168186

                                                        Unweighted correlation between the two sets = 001

                                                        Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

                                                        6 Converting output to APA style tables using LATEX

                                                        Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenient

                                                        47

                                                        LATEXoutput and finally df2latex converts a generic data frame to LATEX

                                                        An example of converting the output from fa to LATEXappears in Table 2

                                                        Table 2 fa2latexA factor analysis table from the psych package in R

                                                        Variable MR1 MR2 MR3 h2 u2 com

                                                        Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

                                                        SS loadings 264 186 15

                                                        MR1 100 059 054MR2 059 100 052MR3 054 052 100

                                                        48

                                                        7 Miscellaneous functions

                                                        A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

                                                        blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

                                                        df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

                                                        scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

                                                        cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

                                                        cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

                                                        dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

                                                        fisherz Convert a correlation to the corresponding Fisher z score

                                                        geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

                                                        ICC and cohenkappa are typically used to find the reliability for raters

                                                        headtail combines the head and tail functions to show the first and last lines of a dataset or output

                                                        topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

                                                        mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

                                                        prep finds the probability of replication for an F t or r and estimate effect size

                                                        partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

                                                        rangeCorrection will correct correlations for restriction of range

                                                        reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

                                                        49

                                                        superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

                                                        8 Data sets

                                                        A number of data sets for demonstrating psychometric techniques are included in thepsych package These include six data sets showing a hierarchical factor structure (fivecognitive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2and one from health psychology Reise) One of these (Thurstone) is used as an examplein the sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) or 13 personality inventory scores (epibfi) and14 multiple choice iq items (iqitems) The vegetables example has paired comparisonpreferences for 9 vegetables This is an example of Thurstonian scaling used by Guilford(1954) and Nunnally (1967) Other data sets include cubits peas and heights fromGalton

                                                        Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

                                                        bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

                                                        satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

                                                        epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

                                                        50

                                                        iq 14 multiple choice ability items were included as part of the Synthetic Aperture Person-ality Assessment (SAPA) web based personality assessment project The data from1000 subjects are included here as a demonstration set for scoring multiple choiceinventories and doing basic item statistics

                                                        galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to ntroduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

                                                        Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

                                                        miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

                                                        9 Development version and a users guide

                                                        The most recent development version is available as a source file at the repository main-tained at httppersonality-projectorgr That version will have removed the mostrecently discovered bugs (but perhaps introduced other yet to be discovered ones) Todownload that version go to the repository httppersonality-projectorgrsrc

                                                        contrib and wander around For a Mac this version can be installed directly using theldquoother repositoryrdquo option in the package installer For a PC the zip file for the most recentrelease has been created using the win-builder facility at CRAN The development releasefor the Mac is usually several weeks ahead of the PC development version

                                                        Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httppersonality-projectorgrpsych_manualpdf

                                                        News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

                                                        gt news(Version gt 170package=psych)

                                                        51

                                                        10 Psychometric Theory

                                                        The psych package has been developed to help psychologists do basic research Many ofthe functions were developed to supplement a book (httppersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep)More information about the use of some of the functions may be found in the book

                                                        For more extensive discussion of the use of psych in particular and R in general consulthttppersonality-projectorgrrguidehtml A short guide to R

                                                        11 SessionInfo

                                                        This document was prepared using the following settings

                                                        gt sessionInfo()

                                                        R Under development (unstable) (2017-03-05 r72309)

                                                        Platform x86_64-apple-darwin1340 (64-bit)

                                                        Running under macOS Sierra 10124

                                                        Matrix products default

                                                        BLAS LibraryFrameworksRframeworkVersions34ResourcesliblibRblas0dylib

                                                        LAPACK LibraryFrameworksRframeworkVersions34ResourcesliblibRlapackdylib

                                                        locale

                                                        [1] C

                                                        attached base packages

                                                        [1] stats graphics grDevices utils datasets methods base

                                                        other attached packages

                                                        [1] psych_17421

                                                        loaded via a namespace (and not attached)

                                                        [1] compiler_340 parallel_340 tools_340 foreign_08-67

                                                        [5] KernSmooth_223-15 nlme_31-131 mnormt_15-4 grid_340

                                                        [9] lattice_020-34

                                                        52

                                                        References

                                                        Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

                                                        Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

                                                        Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

                                                        Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

                                                        Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

                                                        Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

                                                        Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

                                                        Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

                                                        Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

                                                        Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

                                                        Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

                                                        Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

                                                        Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

                                                        Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

                                                        Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

                                                        53

                                                        Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

                                                        Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

                                                        Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

                                                        Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

                                                        Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

                                                        Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

                                                        Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

                                                        Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

                                                        Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

                                                        Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

                                                        MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

                                                        Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

                                                        McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

                                                        Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

                                                        Nunnally J C (1967) Psychometric theory McGraw-Hill New York

                                                        54

                                                        Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

                                                        3rd edition

                                                        Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

                                                        Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

                                                        Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

                                                        Revelle W (2015) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 158

                                                        Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

                                                        Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

                                                        Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

                                                        Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

                                                        Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook ofIndividual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

                                                        Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

                                                        Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

                                                        Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

                                                        Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bang

                                                        55

                                                        for the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

                                                        Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

                                                        Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

                                                        Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

                                                        Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

                                                        Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

                                                        Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

                                                        Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

                                                        Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

                                                        Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

                                                        Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

                                                        Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

                                                        56

                                                        Index

                                                        affect 14 24alpha 5 6

                                                        Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26bifactor 6biserial 13 34blockrandom 49burt 34

                                                        char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49circular statistics 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49ctv 7cubits 50

                                                        densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49diagram 8drawcor 34drawtetra 34dummycode 13

                                                        dynamite plot 19

                                                        edit 3epibfi 50error bars 19errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23

                                                        fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factor analysis 6factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49

                                                        galton 51generalized least squares 6geometricmean 49GPArotation 7guttman 6

                                                        harmonicmean 49head 49headtail 49heights 50hetdiagram 7Hmisc 28Holzinger 50

                                                        57

                                                        ICC 6 49iclust 6iclustdiagram 7Index 49introduction to psychometric theory with ap-

                                                        plications in R 7iqitems 50irtfa 6 47irt2latex 47 49

                                                        KnitR 47

                                                        lavaan 38library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27lowess 14

                                                        makekeys 14MAP 6mardia 49maximum likelihood 6mediate 4 41 42mediatediagram 41minimum residual 6mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevel 37multilevelreliability 6multiple regression 38

                                                        nfactors 6nlme 37

                                                        omega 6 7outlier 3 11 12

                                                        padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7principal axis 6psych 3 5ndash8 28 47 49ndash52

                                                        R functionaffect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50

                                                        58

                                                        densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13edit 3epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49head 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47

                                                        irt2latex 47 49library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7psych 51psych package

                                                        affect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26

                                                        59

                                                        biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49corsmooth 34cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7

                                                        fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47irt2latex 47 49lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12prep 49pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34

                                                        60

                                                        polyserial 34principal 5ndash7psych 51rtest 28rangeCorrection 49readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

                                                        rtest 28

                                                        rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50table 19tail 49tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

                                                        R package

                                                        61

                                                        ctv 7GPArotation 7Hmisc 28KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash8 28 47 49ndash52Rgraphviz 8sem 7 50stats 28Sweave 47xtable 47

                                                        rtest 28rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49Rgraphviz 8

                                                        SAPA 26 50 51satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43sem 7 50set correlation 44setcor 49setCor 4 38 41 44 46 47simmultilevel 37

                                                        spider 13stars 14stats 28StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50Sweave 47

                                                        table 19tail 49tetrachoric 6 34Thurstone 28 38 50Thurstone33 50topBottom 49

                                                        vegetables 50 51violinBy 14 18vss 5 6

                                                        weighted least squares 6withinBetween 37

                                                        xtable 47

                                                        62

                                                        • Jump starting the psych packagendasha guide for the impatient
                                                        • Psychometric functions are summarized in the second vignette
                                                        • Overview of this and related documents
                                                        • Getting started
                                                        • Basic data analysis
                                                          • Getting the data by using readfile
                                                          • Data input from the clipboard
                                                          • Basic descriptive statistics
                                                            • Outlier detection using outlier
                                                            • Basic data cleaning using scrub
                                                            • Recoding categorical variables into dummy coded variables
                                                              • Simple descriptive graphics
                                                                • Scatter Plot Matrices
                                                                • Density or violin plots
                                                                • Means and error bars
                                                                • Error bars for tabular data
                                                                • Two dimensional displays of means and errors
                                                                • Back to back histograms
                                                                • Correlational structure
                                                                • Heatmap displays of correlational structure
                                                                  • Testing correlations
                                                                  • Polychoric tetrachoric polyserial and biserial correlations
                                                                    • Multilevel modeling
                                                                      • Decomposing data into within and between level correlations using statsBy
                                                                      • Generating and displaying multilevel data
                                                                      • Factor analysis by groups
                                                                        • Multiple Regression mediation moderation and set correlations
                                                                          • Multiple regression from data or correlation matrices
                                                                          • Mediation and Moderation analysis
                                                                          • Set Correlation
                                                                            • Converting output to APA style tables using LaTeX
                                                                            • Miscellaneous functions
                                                                            • Data sets
                                                                            • Development version and a users guide
                                                                            • Psychometric Theory
                                                                            • SessionInfo

                                                          gt png(corplotpng)gt corPlot(Thurstonenumbers=TRUEupper=FALSEdiag=FALSEmain=9 cognitive variables from Thurstone)

                                                          gt devoff()

                                                          null device

                                                          1

                                                          Figure 11 The structure of correlation matrix can be seen more clearly if the variables aregrouped by factor and then the correlations are shown by color By using the rsquonumbersrsquooption the values are displayed as well By default the complete matrix is shown Settingupper=FALSE and diag=FALSE shows a cleaner figure

                                                          29

                                                          gt png(circplotpng)gt circ lt- simcirc(24)

                                                          gt rcirc lt- cor(circ)

                                                          gt corPlot(rcircmain=24 variables in a circumplex)gt devoff()

                                                          null device

                                                          1

                                                          Figure 12 Using the corPlot function to show the correlations in a circumplex Correlationsare highest near the diagonal diminish to zero further from the diagonal and the increaseagain towards the corners of the matrix Circumplex structures are common in the studyof affect For circumplex structures it is perhaps useful to show the complete matrix

                                                          30

                                                          gt png(spiderpng)gt oplt- par(mfrow=c(22))

                                                          gt spider(y=c(161218)x=124data=rcircfill=TRUEmain=Spider plot of 24 circumplex variables)

                                                          gt op lt- par(mfrow=c(11))

                                                          gt devoff()

                                                          null device

                                                          1

                                                          Figure 13 A spider plot can show circumplex structure very clearly Circumplex structuresare common in the study of affect

                                                          31

                                                          Table 1 The corrtest function reports correlations cell sizes and raw and adjustedprobability values corrp reports the probability values for a correlation matrix Bydefault the adjustment used is that of Holm (1979)gt corrtest(satact)

                                                          Callcorrtest(x = satact)

                                                          Correlation matrix

                                                          gender education age ACT SATV SATQ

                                                          gender 100 009 -002 -004 -002 -017

                                                          education 009 100 055 015 005 003

                                                          age -002 055 100 011 -004 -003

                                                          ACT -004 015 011 100 056 059

                                                          SATV -002 005 -004 056 100 064

                                                          SATQ -017 003 -003 059 064 100

                                                          Sample Size

                                                          gender education age ACT SATV SATQ

                                                          gender 700 700 700 700 700 687

                                                          education 700 700 700 700 700 687

                                                          age 700 700 700 700 700 687

                                                          ACT 700 700 700 700 700 687

                                                          SATV 700 700 700 700 700 687

                                                          SATQ 687 687 687 687 687 687

                                                          Probability values (Entries above the diagonal are adjusted for multiple tests)

                                                          gender education age ACT SATV SATQ

                                                          gender 000 017 100 100 1 0

                                                          education 002 000 000 000 1 1

                                                          age 058 000 000 003 1 1

                                                          ACT 033 000 000 000 0 0

                                                          SATV 062 022 026 000 0 0

                                                          SATQ 000 036 037 000 0 0

                                                          To see confidence intervals of the correlations print with the short=FALSE option

                                                          32

                                                          depending upon the input

                                                          1) For a sample size n find the t and p value for a single correlation as well as the confidenceinterval

                                                          gt rtest(503)

                                                          Correlation tests

                                                          Callrtest(n = 50 r12 = 03)

                                                          Test of significance of a correlation

                                                          t value 218 with probability lt 0034

                                                          and confidence interval 002 053

                                                          2) For sample sizes of n and n2 (n2 = n if not specified) find the z of the difference betweenthe z transformed correlations divided by the standard error of the difference of two zscores

                                                          gt rtest(3046)

                                                          Correlation tests

                                                          Callrtest(n = 30 r12 = 04 r34 = 06)

                                                          Test of difference between two independent correlations

                                                          z value 099 with probability 032

                                                          3) For sample size n and correlations ra= r12 rb= r23 and r13 specified test for thedifference of two dependent correlations (Steiger case A)

                                                          gt rtest(103451)

                                                          Correlation tests

                                                          Call[1] rtest(n = 103 r12 = 04 r23 = 01 r13 = 05 )

                                                          Test of difference between two correlated correlations

                                                          t value -089 with probability lt 037

                                                          4) For sample size n test for the difference between two dependent correlations involvingdifferent variables (Steiger case B)

                                                          gt rtest(103567558) steiger Case B

                                                          Correlation tests

                                                          Callrtest(n = 103 r12 = 05 r34 = 06 r23 = 07 r13 = 05 r14 = 05

                                                          r24 = 08)

                                                          Test of difference between two dependent correlations

                                                          z value -12 with probability 023

                                                          To test whether a matrix of correlations differs from what would be expected if the popu-lation correlations were all zero the function cortest follows Steiger (1980) who pointedout that the sum of the squared elements of a correlation matrix or the Fisher z scoreequivalents is distributed as chi square under the null hypothesis that the values are zero(ie elements of the identity matrix) This is particularly useful for examining whethercorrelations in a single matrix differ from zero or for comparing two matrices Althoughobvious cortest can be used to test whether the satact data matrix produces non-zerocorrelations (it does) This is a much more appropriate test when testing whether a residualmatrix differs from zero

                                                          gt cortest(satact)

                                                          33

                                                          Tests of correlation matrices

                                                          Callcortest(R1 = satact)

                                                          Chi Square value 132542 with df = 15 with probability lt 18e-273

                                                          36 Polychoric tetrachoric polyserial and biserial correlations

                                                          The Pearson correlation of dichotomous data is also known as the φ coefficient If thedata eg ability items are thought to represent an underlying continuous although latentvariable the φ will underestimate the value of the Pearson applied to these latent variablesOne solution to this problem is to use the tetrachoric correlation which is based uponthe assumption of a bivariate normal distribution that has been cut at certain points Thedrawtetra function demonstrates the process (Figure 14) This is also shown in termsof dichotomizing the bivariate normal density function using the drawcor function (Fig-ure 15) A simple generalization of this to the case of the multiple cuts is the polychoric

                                                          correlation

                                                          Other estimated correlations based upon the assumption of bivariate normality with cutpoints include the biserial and polyserial correlation

                                                          If the data are a mix of continuous polytomous and dichotomous variables the mixedcor

                                                          function will calculate the appropriate mixture of Pearson polychoric tetrachoric biserialand polyserial correlations

                                                          The correlation matrix resulting from a number of tetrachoric or polychoric correlationmatrix sometimes will not be positive semi-definite This will sometimes happen if thecorrelation matrix is formed by using pair-wise deletion of cases The corsmooth functionwill adjust the smallest eigen values of the correlation matrix to make them positive rescaleall of them to sum to the number of variables and produce aldquosmoothedrdquocorrelation matrixAn example of this problem is a data set of burt which probably had a typo in the originalcorrelation matrix Smoothing the matrix corrects this problem

                                                          4 Multilevel modeling

                                                          Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular use

                                                          34

                                                          gt drawtetra()

                                                          minus3 minus2 minus1 0 1 2 3

                                                          minus3

                                                          minus2

                                                          minus1

                                                          01

                                                          23

                                                          Y rho = 05phi = 033

                                                          X gt τY gt Τ

                                                          X lt τY gt Τ

                                                          X gt τY lt Τ

                                                          X lt τY lt Τ

                                                          x

                                                          dnor

                                                          m(x

                                                          )

                                                          X gt τ

                                                          τ

                                                          x1

                                                          Y gt Τ

                                                          Τ

                                                          Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values

                                                          35

                                                          gt drawcor(expand=20cuts=c(00))

                                                          xy

                                                          z

                                                          Bivariate density rho = 05

                                                          Figure 15 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values It isfound (laboriously) by optimizing the fit of the bivariate normal for various values of thecorrelation to the observed cell frequencies

                                                          36

                                                          is the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups

                                                          41 Decomposing data into within and between level correlations usingstatsBy

                                                          There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler functionto give some of the basic descriptive statistics for two level models

                                                          This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package

                                                          rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)

                                                          where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means

                                                          42 Generating and displaying multilevel data

                                                          withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a betweengroup correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data

                                                          simmultilevel will generate simulated data with a multilevel structure

                                                          The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary

                                                          function specifying the variable of interest

                                                          37

                                                          Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )

                                                          43 Factor analysis by groups

                                                          Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output

                                                          sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)

                                                          faBy(sbnfactors=5) find the 5 factor solution for each education level

                                                          5 Multiple Regression mediation moderation and set cor-relations

                                                          The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices

                                                          51 Multiple regression from data or correlation matrices

                                                          The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstonecorrelation matrix and find the multiple correlation of the last five variables as a functionof the first 4

                                                          gt setCor(y = 59x=14data=Thurstone)

                                                          Call setCor(y = 59 x = 14 data = Thurstone)

                                                          Multiple Regression from matrix input

                                                          Beta weights

                                                          FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

                                                          Sentences 009 007 025 021 020

                                                          Vocabulary 009 017 009 016 -002

                                                          SentCompletion 002 005 004 021 008

                                                          FirstLetters 058 045 021 008 031

                                                          38

                                                          Multiple R

                                                          FourLetterWords Suffixes LetterSeries Pedigrees

                                                          069 063 050 058

                                                          LetterGroup

                                                          048

                                                          multiple R2

                                                          FourLetterWords Suffixes LetterSeries Pedigrees

                                                          048 040 025 034

                                                          LetterGroup

                                                          023

                                                          Multiple Inflation Factor (VIF) = 1(1-SMC) =

                                                          Sentences Vocabulary SentCompletion FirstLetters

                                                          369 388 300 135

                                                          Unweighted multiple R

                                                          FourLetterWords Suffixes LetterSeries Pedigrees

                                                          059 058 049 058

                                                          LetterGroup

                                                          045

                                                          Unweighted multiple R2

                                                          FourLetterWords Suffixes LetterSeries Pedigrees

                                                          034 034 024 033

                                                          LetterGroup

                                                          020

                                                          Various estimates of between set correlations

                                                          Squared Canonical Correlations

                                                          [1] 06280 01478 00076 00049

                                                          Average squared canonical correlation = 02

                                                          Cohens Set Correlation R2 = 069

                                                          Unweighted correlation between the two sets = 073

                                                          By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed

                                                          gt sc lt- setCor(y = 59x=34data=Thurstonez=12)

                                                          Call setCor(y = 59 x = 34 data = Thurstone z = 12)

                                                          Multiple Regression from matrix input

                                                          Beta weights

                                                          FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

                                                          SentCompletion 002 005 004 021 008

                                                          FirstLetters 058 045 021 008 031

                                                          Multiple R

                                                          FourLetterWords Suffixes LetterSeries Pedigrees

                                                          058 046 021 018

                                                          LetterGroup

                                                          030

                                                          39

                                                          multiple R2

                                                          FourLetterWords Suffixes LetterSeries Pedigrees

                                                          0331 0210 0043 0032

                                                          LetterGroup

                                                          0092

                                                          Multiple Inflation Factor (VIF) = 1(1-SMC) =

                                                          SentCompletion FirstLetters

                                                          102 102

                                                          Unweighted multiple R

                                                          FourLetterWords Suffixes LetterSeries Pedigrees

                                                          044 035 017 014

                                                          LetterGroup

                                                          026

                                                          Unweighted multiple R2

                                                          FourLetterWords Suffixes LetterSeries Pedigrees

                                                          019 012 003 002

                                                          LetterGroup

                                                          007

                                                          Various estimates of between set correlations

                                                          Squared Canonical Correlations

                                                          [1] 0405 0023

                                                          Average squared canonical correlation = 021

                                                          Cohens Set Correlation R2 = 042

                                                          Unweighted correlation between the two sets = 048

                                                          gt round(sc$residual2)

                                                          FourLetterWords Suffixes LetterSeries Pedigrees

                                                          FourLetterWords 052 011 009 006

                                                          Suffixes 011 060 -001 001

                                                          LetterSeries 009 -001 075 028

                                                          Pedigrees 006 001 028 066

                                                          LetterGroup 013 003 037 020

                                                          LetterGroup

                                                          FourLetterWords 013

                                                          Suffixes 003

                                                          LetterSeries 037

                                                          Pedigrees 020

                                                          LetterGroup 077

                                                          52 Mediation and Moderation analysis

                                                          Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping

                                                          40

                                                          Consider the example from Preacher and Hayes (2004) as analyzed using the mediate

                                                          function and the subsequent graphic from mediatediagram The data are found in theexample for mediate

                                                          Call mediate(y = SATIS x = THERAPY m = ATTRIB data = sobel)

                                                          The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB

                                                          Total Direct effect(c) of THERAPY on SATIS = 076 SE = 031 t direct = 25 with probability = 0019

                                                          Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t direct = 135 with probability = 019

                                                          Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033

                                                          Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 004 Upper CI = 069

                                                          R2 of model = 031

                                                          To see the longer output specify short = FALSE in the print statement

                                                          Full output

                                                          Total effect estimates (c)

                                                          SATIS se t Prob

                                                          THERAPY 076 031 25 00186

                                                          Direct effect estimates (c)SATIS se t Prob

                                                          THERAPY 043 032 135 0190

                                                          ATTRIB 040 018 223 0034

                                                          a effect estimates

                                                          THERAPY se t Prob

                                                          ATTRIB 082 03 274 00106

                                                          b effect estimates

                                                          SATIS se t Prob

                                                          ATTRIB 04 018 223 0034

                                                          ab effect estimates

                                                          SATIS boot sd lower upper

                                                          THERAPY 033 032 017 004 069

                                                          bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables

                                                          setCor(y = c( SATV SATQ) x = c(education age ) data = satact std=TRUE)

                                                          bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap

                                                          mediate(y = c( SATV ) x = c(education age ) m= ACT data =satactstd=TRUEniter=50)

                                                          bull mediate will take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It then tests the mediation effect using a bootstrap The particular example is for demonstration purposes only and shows neithermoderation nor mediation The number of iterations for the boot strap was set to 50

                                                          41

                                                          gt mediatediagram(preacher)

                                                          Mediation model

                                                          THERAPY SATIS

                                                          ATTRIB

                                                          082

                                                          c = 076

                                                          c = 043

                                                          04

                                                          Figure 16 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram

                                                          42

                                                          gt preacher lt- setCor(1c(23)sobelstd=FALSE)

                                                          gt setCordiagram(preacher)

                                                          Regression Models

                                                          THERAPY

                                                          ATTRIB

                                                          SATIS

                                                          043

                                                          04

                                                          021

                                                          Figure 17 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure

                                                          43

                                                          for speed The default number of boot straps is 5000

                                                          53 Set Correlation

                                                          An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor

                                                          function Set correlation is

                                                          R2 = 1minusn

                                                          prodi=1

                                                          (1minusλi)

                                                          where λi is the ith eigen value of the eigen value decomposition of the matrix

                                                          R = Rminus1xx RxyRminus1

                                                          xx Rminus1xy

                                                          Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate

                                                          setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data

                                                          Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things tonotice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied

                                                          For this example the analysis is done on the correlation matrix rather than the rawdata

                                                          gt C lt- cov(satactuse=pairwise)

                                                          gt model1 lt- lm(ACT~ gender + education + age data=satact)

                                                          gt summary(model1)

                                                          Call

                                                          lm(formula = ACT ~ gender + education + age data = satact)

                                                          Residuals

                                                          44

                                                          Call mediate(y = c(SATQ) x = c(ACT) m = education data = satact

                                                          mod = gender niter = 50 std = TRUE)

                                                          The DV (Y) was SATQ The IV (X) was ACT gender ACTXgndr The mediating variable(s) = education

                                                          Total Direct effect(c) of ACT on SATQ = 058 SE = 003 t direct = 1925 with probability = 0

                                                          Direct effect (c) of ACT on SATQ removing education = 059 SE = 003 t direct = 1926 with probability = 0

                                                          Indirect effect (ab) of ACT on SATQ through education = -001

                                                          Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -002 Upper CI = 0

                                                          Total Direct effect(c) of gender on SATQ = -014 SE = 003 t direct = -478 with probability = 21e-06

                                                          Direct effect (c) of gender on NA removing education = -014 SE = 003 t direct = -463 with probability = 44e-06

                                                          Indirect effect (ab) of gender on SATQ through education = 0

                                                          Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -001 Upper CI = 0

                                                          Total Direct effect(c) of ACTXgndr on SATQ = 0 SE = 003 t direct = 002 with probability = 099

                                                          Direct effect (c) of ACTXgndr on NA removing education = 0 SE = 003 t direct = 001 with probability = 099

                                                          Indirect effect (ab) of ACTXgndr on SATQ through education = 0

                                                          Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = 0 Upper CI = 0

                                                          R2 of model = 037

                                                          To see the longer output specify short = FALSE in the print statement

                                                          Full output

                                                          Total effect estimates (c)

                                                          SATQ se t Prob

                                                          ACT 058 003 1925 000e+00

                                                          gender -014 003 -478 210e-06

                                                          ACTXgndr 000 003 002 985e-01

                                                          Direct effect estimates (c)SATQ se t Prob

                                                          ACT 059 003 1926 000e+00

                                                          gender -014 003 -463 437e-06

                                                          ACTXgndr 000 003 001 992e-01

                                                          a effect estimates

                                                          education se t Prob

                                                          ACT 016 004 422 277e-05

                                                          gender 009 004 250 128e-02

                                                          ACTXgndr -001 004 -015 883e-01

                                                          b effect estimates

                                                          SATQ se t Prob

                                                          education -004 003 -145 0147

                                                          ab effect estimates

                                                          SATQ boot sd lower upper

                                                          ACT -001 -001 001 0 0

                                                          gender 000 000 000 0 0

                                                          ACTXgndr 000 000 000 0 0

                                                          Moderation model

                                                          ACT

                                                          gender

                                                          ACTXgndr

                                                          SATQ

                                                          education016 c = 058

                                                          c = 059

                                                          009 c = minus014

                                                          c = minus014

                                                          minus001 c = 0

                                                          c = 0

                                                          minus004

                                                          minus004

                                                          minus007

                                                          002

                                                          Figure 18 Moderated multiple regression requires the raw data

                                                          45

                                                          Min 1Q Median 3Q Max

                                                          -252458 -32133 07769 35921 92630

                                                          Coefficients

                                                          Estimate Std Error t value Pr(gt|t|)

                                                          (Intercept) 2741706 082140 33378 lt 2e-16

                                                          gender -048606 037984 -1280 020110

                                                          education 047890 015235 3143 000174

                                                          age 001623 002278 0712 047650

                                                          ---

                                                          Signif codes 0 0001 001 005 01 1

                                                          Residual standard error 4768 on 696 degrees of freedom

                                                          Multiple R-squared 00272 Adjusted R-squared 002301

                                                          F-statistic 6487 on 3 and 696 DF p-value 00002476

                                                          Compare this with the output from setCor

                                                          gt compare with sector

                                                          gt setCor(c(46)c(13)C nobs=700)

                                                          Call setCor(y = c(46) x = c(13) data = C nobs = 700)

                                                          Multiple Regression from matrix input

                                                          Beta weights

                                                          ACT SATV SATQ

                                                          gender -005 -003 -018

                                                          education 014 010 010

                                                          age 003 -010 -009

                                                          Multiple R

                                                          ACT SATV SATQ

                                                          016 010 019

                                                          multiple R2

                                                          ACT SATV SATQ

                                                          00272 00096 00359

                                                          Multiple Inflation Factor (VIF) = 1(1-SMC) =

                                                          gender education age

                                                          101 145 144

                                                          Unweighted multiple R

                                                          ACT SATV SATQ

                                                          015 005 011

                                                          Unweighted multiple R2

                                                          ACT SATV SATQ

                                                          002 000 001

                                                          SE of Beta weights

                                                          ACT SATV SATQ

                                                          gender 018 429 434

                                                          education 022 513 518

                                                          age 022 511 516

                                                          t of Beta Weights

                                                          ACT SATV SATQ

                                                          gender -027 -001 -004

                                                          education 065 002 002

                                                          46

                                                          age 015 -002 -002

                                                          Probability of t lt

                                                          ACT SATV SATQ

                                                          gender 079 099 097

                                                          education 051 098 098

                                                          age 088 098 099

                                                          Shrunken R2

                                                          ACT SATV SATQ

                                                          00230 00054 00317

                                                          Standard Error of R2

                                                          ACT SATV SATQ

                                                          00120 00073 00137

                                                          F

                                                          ACT SATV SATQ

                                                          649 226 863

                                                          Probability of F lt

                                                          ACT SATV SATQ

                                                          248e-04 808e-02 124e-05

                                                          degrees of freedom of regression

                                                          [1] 3 696

                                                          Various estimates of between set correlations

                                                          Squared Canonical Correlations

                                                          [1] 0050 0033 0008

                                                          Chisq of canonical correlations

                                                          [1] 358 231 56

                                                          Average squared canonical correlation = 003

                                                          Cohens Set Correlation R2 = 009

                                                          Shrunken Set Correlation R2 = 008

                                                          F and df of Cohens Set Correlation 726 9 168186

                                                          Unweighted correlation between the two sets = 001

                                                          Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

                                                          6 Converting output to APA style tables using LATEX

                                                          Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenient

                                                          47

                                                          LATEXoutput and finally df2latex converts a generic data frame to LATEX

                                                          An example of converting the output from fa to LATEXappears in Table 2

                                                          Table 2 fa2latexA factor analysis table from the psych package in R

                                                          Variable MR1 MR2 MR3 h2 u2 com

                                                          Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

                                                          SS loadings 264 186 15

                                                          MR1 100 059 054MR2 059 100 052MR3 054 052 100

                                                          48

                                                          7 Miscellaneous functions

                                                          A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

                                                          blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

                                                          df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

                                                          scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

                                                          cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

                                                          cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

                                                          dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

                                                          fisherz Convert a correlation to the corresponding Fisher z score

                                                          geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

                                                          ICC and cohenkappa are typically used to find the reliability for raters

                                                          headtail combines the head and tail functions to show the first and last lines of a dataset or output

                                                          topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

                                                          mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

                                                          prep finds the probability of replication for an F t or r and estimate effect size

                                                          partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

                                                          rangeCorrection will correct correlations for restriction of range

                                                          reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

                                                          49

                                                          superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

                                                          8 Data sets

                                                          A number of data sets for demonstrating psychometric techniques are included in thepsych package These include six data sets showing a hierarchical factor structure (fivecognitive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2and one from health psychology Reise) One of these (Thurstone) is used as an examplein the sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) or 13 personality inventory scores (epibfi) and14 multiple choice iq items (iqitems) The vegetables example has paired comparisonpreferences for 9 vegetables This is an example of Thurstonian scaling used by Guilford(1954) and Nunnally (1967) Other data sets include cubits peas and heights fromGalton

                                                          Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

                                                          bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

                                                          satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

                                                          epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

                                                          50

                                                          iq 14 multiple choice ability items were included as part of the Synthetic Aperture Person-ality Assessment (SAPA) web based personality assessment project The data from1000 subjects are included here as a demonstration set for scoring multiple choiceinventories and doing basic item statistics

                                                          galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to ntroduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

                                                          Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

                                                          miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

                                                          9 Development version and a users guide

                                                          The most recent development version is available as a source file at the repository main-tained at httppersonality-projectorgr That version will have removed the mostrecently discovered bugs (but perhaps introduced other yet to be discovered ones) Todownload that version go to the repository httppersonality-projectorgrsrc

                                                          contrib and wander around For a Mac this version can be installed directly using theldquoother repositoryrdquo option in the package installer For a PC the zip file for the most recentrelease has been created using the win-builder facility at CRAN The development releasefor the Mac is usually several weeks ahead of the PC development version

                                                          Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httppersonality-projectorgrpsych_manualpdf

                                                          News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

                                                          gt news(Version gt 170package=psych)

                                                          51

                                                          10 Psychometric Theory

                                                          The psych package has been developed to help psychologists do basic research Many ofthe functions were developed to supplement a book (httppersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep)More information about the use of some of the functions may be found in the book

                                                          For more extensive discussion of the use of psych in particular and R in general consulthttppersonality-projectorgrrguidehtml A short guide to R

                                                          11 SessionInfo

                                                          This document was prepared using the following settings

                                                          gt sessionInfo()

                                                          R Under development (unstable) (2017-03-05 r72309)

                                                          Platform x86_64-apple-darwin1340 (64-bit)

                                                          Running under macOS Sierra 10124

                                                          Matrix products default

                                                          BLAS LibraryFrameworksRframeworkVersions34ResourcesliblibRblas0dylib

                                                          LAPACK LibraryFrameworksRframeworkVersions34ResourcesliblibRlapackdylib

                                                          locale

                                                          [1] C

                                                          attached base packages

                                                          [1] stats graphics grDevices utils datasets methods base

                                                          other attached packages

                                                          [1] psych_17421

                                                          loaded via a namespace (and not attached)

                                                          [1] compiler_340 parallel_340 tools_340 foreign_08-67

                                                          [5] KernSmooth_223-15 nlme_31-131 mnormt_15-4 grid_340

                                                          [9] lattice_020-34

                                                          52

                                                          References

                                                          Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

                                                          Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

                                                          Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

                                                          Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

                                                          Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

                                                          Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

                                                          Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

                                                          Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

                                                          Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

                                                          Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

                                                          Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

                                                          Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

                                                          Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

                                                          Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

                                                          Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

                                                          53

                                                          Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

                                                          Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

                                                          Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

                                                          Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

                                                          Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

                                                          Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

                                                          Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

                                                          Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

                                                          Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

                                                          Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

                                                          MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

                                                          Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

                                                          McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

                                                          Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

                                                          Nunnally J C (1967) Psychometric theory McGraw-Hill New York

                                                          54

                                                          Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

                                                          3rd edition

                                                          Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

                                                          Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

                                                          Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

                                                          Revelle W (2015) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 158

                                                          Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

                                                          Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

                                                          Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

                                                          Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

                                                          Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook ofIndividual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

                                                          Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

                                                          Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

                                                          Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

                                                          Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bang

                                                          55

                                                          for the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

                                                          Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

                                                          Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

                                                          Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

                                                          Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

                                                          Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

                                                          Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

                                                          Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

                                                          Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

                                                          Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

                                                          Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

                                                          Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

                                                          56

                                                          Index

                                                          affect 14 24alpha 5 6

                                                          Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26bifactor 6biserial 13 34blockrandom 49burt 34

                                                          char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49circular statistics 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49ctv 7cubits 50

                                                          densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49diagram 8drawcor 34drawtetra 34dummycode 13

                                                          dynamite plot 19

                                                          edit 3epibfi 50error bars 19errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23

                                                          fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factor analysis 6factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49

                                                          galton 51generalized least squares 6geometricmean 49GPArotation 7guttman 6

                                                          harmonicmean 49head 49headtail 49heights 50hetdiagram 7Hmisc 28Holzinger 50

                                                          57

                                                          ICC 6 49iclust 6iclustdiagram 7Index 49introduction to psychometric theory with ap-

                                                          plications in R 7iqitems 50irtfa 6 47irt2latex 47 49

                                                          KnitR 47

                                                          lavaan 38library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27lowess 14

                                                          makekeys 14MAP 6mardia 49maximum likelihood 6mediate 4 41 42mediatediagram 41minimum residual 6mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevel 37multilevelreliability 6multiple regression 38

                                                          nfactors 6nlme 37

                                                          omega 6 7outlier 3 11 12

                                                          padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7principal axis 6psych 3 5ndash8 28 47 49ndash52

                                                          R functionaffect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50

                                                          58

                                                          densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13edit 3epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49head 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47

                                                          irt2latex 47 49library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7psych 51psych package

                                                          affect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26

                                                          59

                                                          biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49corsmooth 34cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7

                                                          fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47irt2latex 47 49lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12prep 49pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34

                                                          60

                                                          polyserial 34principal 5ndash7psych 51rtest 28rangeCorrection 49readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

                                                          rtest 28

                                                          rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50table 19tail 49tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

                                                          R package

                                                          61

                                                          ctv 7GPArotation 7Hmisc 28KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash8 28 47 49ndash52Rgraphviz 8sem 7 50stats 28Sweave 47xtable 47

                                                          rtest 28rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49Rgraphviz 8

                                                          SAPA 26 50 51satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43sem 7 50set correlation 44setcor 49setCor 4 38 41 44 46 47simmultilevel 37

                                                          spider 13stars 14stats 28StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50Sweave 47

                                                          table 19tail 49tetrachoric 6 34Thurstone 28 38 50Thurstone33 50topBottom 49

                                                          vegetables 50 51violinBy 14 18vss 5 6

                                                          weighted least squares 6withinBetween 37

                                                          xtable 47

                                                          62

                                                          • Jump starting the psych packagendasha guide for the impatient
                                                          • Psychometric functions are summarized in the second vignette
                                                          • Overview of this and related documents
                                                          • Getting started
                                                          • Basic data analysis
                                                            • Getting the data by using readfile
                                                            • Data input from the clipboard
                                                            • Basic descriptive statistics
                                                              • Outlier detection using outlier
                                                              • Basic data cleaning using scrub
                                                              • Recoding categorical variables into dummy coded variables
                                                                • Simple descriptive graphics
                                                                  • Scatter Plot Matrices
                                                                  • Density or violin plots
                                                                  • Means and error bars
                                                                  • Error bars for tabular data
                                                                  • Two dimensional displays of means and errors
                                                                  • Back to back histograms
                                                                  • Correlational structure
                                                                  • Heatmap displays of correlational structure
                                                                    • Testing correlations
                                                                    • Polychoric tetrachoric polyserial and biserial correlations
                                                                      • Multilevel modeling
                                                                        • Decomposing data into within and between level correlations using statsBy
                                                                        • Generating and displaying multilevel data
                                                                        • Factor analysis by groups
                                                                          • Multiple Regression mediation moderation and set correlations
                                                                            • Multiple regression from data or correlation matrices
                                                                            • Mediation and Moderation analysis
                                                                            • Set Correlation
                                                                              • Converting output to APA style tables using LaTeX
                                                                              • Miscellaneous functions
                                                                              • Data sets
                                                                              • Development version and a users guide
                                                                              • Psychometric Theory
                                                                              • SessionInfo

                                                            gt png(circplotpng)gt circ lt- simcirc(24)

                                                            gt rcirc lt- cor(circ)

                                                            gt corPlot(rcircmain=24 variables in a circumplex)gt devoff()

                                                            null device

                                                            1

                                                            Figure 12 Using the corPlot function to show the correlations in a circumplex Correlationsare highest near the diagonal diminish to zero further from the diagonal and the increaseagain towards the corners of the matrix Circumplex structures are common in the studyof affect For circumplex structures it is perhaps useful to show the complete matrix

                                                            30

                                                            gt png(spiderpng)gt oplt- par(mfrow=c(22))

                                                            gt spider(y=c(161218)x=124data=rcircfill=TRUEmain=Spider plot of 24 circumplex variables)

                                                            gt op lt- par(mfrow=c(11))

                                                            gt devoff()

                                                            null device

                                                            1

                                                            Figure 13 A spider plot can show circumplex structure very clearly Circumplex structuresare common in the study of affect

                                                            31

                                                            Table 1 The corrtest function reports correlations cell sizes and raw and adjustedprobability values corrp reports the probability values for a correlation matrix Bydefault the adjustment used is that of Holm (1979)gt corrtest(satact)

                                                            Callcorrtest(x = satact)

                                                            Correlation matrix

                                                            gender education age ACT SATV SATQ

                                                            gender 100 009 -002 -004 -002 -017

                                                            education 009 100 055 015 005 003

                                                            age -002 055 100 011 -004 -003

                                                            ACT -004 015 011 100 056 059

                                                            SATV -002 005 -004 056 100 064

                                                            SATQ -017 003 -003 059 064 100

                                                            Sample Size

                                                            gender education age ACT SATV SATQ

                                                            gender 700 700 700 700 700 687

                                                            education 700 700 700 700 700 687

                                                            age 700 700 700 700 700 687

                                                            ACT 700 700 700 700 700 687

                                                            SATV 700 700 700 700 700 687

                                                            SATQ 687 687 687 687 687 687

                                                            Probability values (Entries above the diagonal are adjusted for multiple tests)

                                                            gender education age ACT SATV SATQ

                                                            gender 000 017 100 100 1 0

                                                            education 002 000 000 000 1 1

                                                            age 058 000 000 003 1 1

                                                            ACT 033 000 000 000 0 0

                                                            SATV 062 022 026 000 0 0

                                                            SATQ 000 036 037 000 0 0

                                                            To see confidence intervals of the correlations print with the short=FALSE option

                                                            32

                                                            depending upon the input

                                                            1) For a sample size n find the t and p value for a single correlation as well as the confidenceinterval

                                                            gt rtest(503)

                                                            Correlation tests

                                                            Callrtest(n = 50 r12 = 03)

                                                            Test of significance of a correlation

                                                            t value 218 with probability lt 0034

                                                            and confidence interval 002 053

                                                            2) For sample sizes of n and n2 (n2 = n if not specified) find the z of the difference betweenthe z transformed correlations divided by the standard error of the difference of two zscores

                                                            gt rtest(3046)

                                                            Correlation tests

                                                            Callrtest(n = 30 r12 = 04 r34 = 06)

                                                            Test of difference between two independent correlations

                                                            z value 099 with probability 032

                                                            3) For sample size n and correlations ra= r12 rb= r23 and r13 specified test for thedifference of two dependent correlations (Steiger case A)

                                                            gt rtest(103451)

                                                            Correlation tests

                                                            Call[1] rtest(n = 103 r12 = 04 r23 = 01 r13 = 05 )

                                                            Test of difference between two correlated correlations

                                                            t value -089 with probability lt 037

                                                            4) For sample size n test for the difference between two dependent correlations involvingdifferent variables (Steiger case B)

                                                            gt rtest(103567558) steiger Case B

                                                            Correlation tests

                                                            Callrtest(n = 103 r12 = 05 r34 = 06 r23 = 07 r13 = 05 r14 = 05

                                                            r24 = 08)

                                                            Test of difference between two dependent correlations

                                                            z value -12 with probability 023

                                                            To test whether a matrix of correlations differs from what would be expected if the popu-lation correlations were all zero the function cortest follows Steiger (1980) who pointedout that the sum of the squared elements of a correlation matrix or the Fisher z scoreequivalents is distributed as chi square under the null hypothesis that the values are zero(ie elements of the identity matrix) This is particularly useful for examining whethercorrelations in a single matrix differ from zero or for comparing two matrices Althoughobvious cortest can be used to test whether the satact data matrix produces non-zerocorrelations (it does) This is a much more appropriate test when testing whether a residualmatrix differs from zero

                                                            gt cortest(satact)

                                                            33

                                                            Tests of correlation matrices

                                                            Callcortest(R1 = satact)

                                                            Chi Square value 132542 with df = 15 with probability lt 18e-273

                                                            36 Polychoric tetrachoric polyserial and biserial correlations

                                                            The Pearson correlation of dichotomous data is also known as the φ coefficient If thedata eg ability items are thought to represent an underlying continuous although latentvariable the φ will underestimate the value of the Pearson applied to these latent variablesOne solution to this problem is to use the tetrachoric correlation which is based uponthe assumption of a bivariate normal distribution that has been cut at certain points Thedrawtetra function demonstrates the process (Figure 14) This is also shown in termsof dichotomizing the bivariate normal density function using the drawcor function (Fig-ure 15) A simple generalization of this to the case of the multiple cuts is the polychoric

                                                            correlation

                                                            Other estimated correlations based upon the assumption of bivariate normality with cutpoints include the biserial and polyserial correlation

                                                            If the data are a mix of continuous polytomous and dichotomous variables the mixedcor

                                                            function will calculate the appropriate mixture of Pearson polychoric tetrachoric biserialand polyserial correlations

                                                            The correlation matrix resulting from a number of tetrachoric or polychoric correlationmatrix sometimes will not be positive semi-definite This will sometimes happen if thecorrelation matrix is formed by using pair-wise deletion of cases The corsmooth functionwill adjust the smallest eigen values of the correlation matrix to make them positive rescaleall of them to sum to the number of variables and produce aldquosmoothedrdquocorrelation matrixAn example of this problem is a data set of burt which probably had a typo in the originalcorrelation matrix Smoothing the matrix corrects this problem

                                                            4 Multilevel modeling

                                                            Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular use

                                                            34

                                                            gt drawtetra()

                                                            minus3 minus2 minus1 0 1 2 3

                                                            minus3

                                                            minus2

                                                            minus1

                                                            01

                                                            23

                                                            Y rho = 05phi = 033

                                                            X gt τY gt Τ

                                                            X lt τY gt Τ

                                                            X gt τY lt Τ

                                                            X lt τY lt Τ

                                                            x

                                                            dnor

                                                            m(x

                                                            )

                                                            X gt τ

                                                            τ

                                                            x1

                                                            Y gt Τ

                                                            Τ

                                                            Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values

                                                            35

                                                            gt drawcor(expand=20cuts=c(00))

                                                            xy

                                                            z

                                                            Bivariate density rho = 05

                                                            Figure 15 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values It isfound (laboriously) by optimizing the fit of the bivariate normal for various values of thecorrelation to the observed cell frequencies

                                                            36

                                                            is the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups

                                                            41 Decomposing data into within and between level correlations usingstatsBy

                                                            There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler functionto give some of the basic descriptive statistics for two level models

                                                            This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package

                                                            rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)

                                                            where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means

                                                            42 Generating and displaying multilevel data

                                                            withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a betweengroup correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data

                                                            simmultilevel will generate simulated data with a multilevel structure

                                                            The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary

                                                            function specifying the variable of interest

                                                            37

                                                            Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )

                                                            43 Factor analysis by groups

                                                            Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output

                                                            sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)

                                                            faBy(sbnfactors=5) find the 5 factor solution for each education level

                                                            5 Multiple Regression mediation moderation and set cor-relations

                                                            The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices

                                                            51 Multiple regression from data or correlation matrices

                                                            The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstonecorrelation matrix and find the multiple correlation of the last five variables as a functionof the first 4

                                                            gt setCor(y = 59x=14data=Thurstone)

                                                            Call setCor(y = 59 x = 14 data = Thurstone)

                                                            Multiple Regression from matrix input

                                                            Beta weights

                                                            FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

                                                            Sentences 009 007 025 021 020

                                                            Vocabulary 009 017 009 016 -002

                                                            SentCompletion 002 005 004 021 008

                                                            FirstLetters 058 045 021 008 031

                                                            38

                                                            Multiple R

                                                            FourLetterWords Suffixes LetterSeries Pedigrees

                                                            069 063 050 058

                                                            LetterGroup

                                                            048

                                                            multiple R2

                                                            FourLetterWords Suffixes LetterSeries Pedigrees

                                                            048 040 025 034

                                                            LetterGroup

                                                            023

                                                            Multiple Inflation Factor (VIF) = 1(1-SMC) =

                                                            Sentences Vocabulary SentCompletion FirstLetters

                                                            369 388 300 135

                                                            Unweighted multiple R

                                                            FourLetterWords Suffixes LetterSeries Pedigrees

                                                            059 058 049 058

                                                            LetterGroup

                                                            045

                                                            Unweighted multiple R2

                                                            FourLetterWords Suffixes LetterSeries Pedigrees

                                                            034 034 024 033

                                                            LetterGroup

                                                            020

                                                            Various estimates of between set correlations

                                                            Squared Canonical Correlations

                                                            [1] 06280 01478 00076 00049

                                                            Average squared canonical correlation = 02

                                                            Cohens Set Correlation R2 = 069

                                                            Unweighted correlation between the two sets = 073

                                                            By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed

                                                            gt sc lt- setCor(y = 59x=34data=Thurstonez=12)

                                                            Call setCor(y = 59 x = 34 data = Thurstone z = 12)

                                                            Multiple Regression from matrix input

                                                            Beta weights

                                                            FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

                                                            SentCompletion 002 005 004 021 008

                                                            FirstLetters 058 045 021 008 031

                                                            Multiple R

                                                            FourLetterWords Suffixes LetterSeries Pedigrees

                                                            058 046 021 018

                                                            LetterGroup

                                                            030

                                                            39

                                                            multiple R2

                                                            FourLetterWords Suffixes LetterSeries Pedigrees

                                                            0331 0210 0043 0032

                                                            LetterGroup

                                                            0092

                                                            Multiple Inflation Factor (VIF) = 1(1-SMC) =

                                                            SentCompletion FirstLetters

                                                            102 102

                                                            Unweighted multiple R

                                                            FourLetterWords Suffixes LetterSeries Pedigrees

                                                            044 035 017 014

                                                            LetterGroup

                                                            026

                                                            Unweighted multiple R2

                                                            FourLetterWords Suffixes LetterSeries Pedigrees

                                                            019 012 003 002

                                                            LetterGroup

                                                            007

                                                            Various estimates of between set correlations

                                                            Squared Canonical Correlations

                                                            [1] 0405 0023

                                                            Average squared canonical correlation = 021

                                                            Cohens Set Correlation R2 = 042

                                                            Unweighted correlation between the two sets = 048

                                                            gt round(sc$residual2)

                                                            FourLetterWords Suffixes LetterSeries Pedigrees

                                                            FourLetterWords 052 011 009 006

                                                            Suffixes 011 060 -001 001

                                                            LetterSeries 009 -001 075 028

                                                            Pedigrees 006 001 028 066

                                                            LetterGroup 013 003 037 020

                                                            LetterGroup

                                                            FourLetterWords 013

                                                            Suffixes 003

                                                            LetterSeries 037

                                                            Pedigrees 020

                                                            LetterGroup 077

                                                            52 Mediation and Moderation analysis

                                                            Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping

                                                            40

                                                            Consider the example from Preacher and Hayes (2004) as analyzed using the mediate

                                                            function and the subsequent graphic from mediatediagram The data are found in theexample for mediate

                                                            Call mediate(y = SATIS x = THERAPY m = ATTRIB data = sobel)

                                                            The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB

                                                            Total Direct effect(c) of THERAPY on SATIS = 076 SE = 031 t direct = 25 with probability = 0019

                                                            Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t direct = 135 with probability = 019

                                                            Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033

                                                            Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 004 Upper CI = 069

                                                            R2 of model = 031

                                                            To see the longer output specify short = FALSE in the print statement

                                                            Full output

                                                            Total effect estimates (c)

                                                            SATIS se t Prob

                                                            THERAPY 076 031 25 00186

                                                            Direct effect estimates (c)SATIS se t Prob

                                                            THERAPY 043 032 135 0190

                                                            ATTRIB 040 018 223 0034

                                                            a effect estimates

                                                            THERAPY se t Prob

                                                            ATTRIB 082 03 274 00106

                                                            b effect estimates

                                                            SATIS se t Prob

                                                            ATTRIB 04 018 223 0034

                                                            ab effect estimates

                                                            SATIS boot sd lower upper

                                                            THERAPY 033 032 017 004 069

                                                            bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables

                                                            setCor(y = c( SATV SATQ) x = c(education age ) data = satact std=TRUE)

                                                            bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap

                                                            mediate(y = c( SATV ) x = c(education age ) m= ACT data =satactstd=TRUEniter=50)

                                                            bull mediate will take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It then tests the mediation effect using a bootstrap The particular example is for demonstration purposes only and shows neithermoderation nor mediation The number of iterations for the boot strap was set to 50

                                                            41

                                                            gt mediatediagram(preacher)

                                                            Mediation model

                                                            THERAPY SATIS

                                                            ATTRIB

                                                            082

                                                            c = 076

                                                            c = 043

                                                            04

                                                            Figure 16 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram

                                                            42

                                                            gt preacher lt- setCor(1c(23)sobelstd=FALSE)

                                                            gt setCordiagram(preacher)

                                                            Regression Models

                                                            THERAPY

                                                            ATTRIB

                                                            SATIS

                                                            043

                                                            04

                                                            021

                                                            Figure 17 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure

                                                            43

                                                            for speed The default number of boot straps is 5000

                                                            53 Set Correlation

                                                            An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor

                                                            function Set correlation is

                                                            R2 = 1minusn

                                                            prodi=1

                                                            (1minusλi)

                                                            where λi is the ith eigen value of the eigen value decomposition of the matrix

                                                            R = Rminus1xx RxyRminus1

                                                            xx Rminus1xy

                                                            Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate

                                                            setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data

                                                            Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things tonotice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied

                                                            For this example the analysis is done on the correlation matrix rather than the rawdata

                                                            gt C lt- cov(satactuse=pairwise)

                                                            gt model1 lt- lm(ACT~ gender + education + age data=satact)

                                                            gt summary(model1)

                                                            Call

                                                            lm(formula = ACT ~ gender + education + age data = satact)

                                                            Residuals

                                                            44

                                                            Call mediate(y = c(SATQ) x = c(ACT) m = education data = satact

                                                            mod = gender niter = 50 std = TRUE)

                                                            The DV (Y) was SATQ The IV (X) was ACT gender ACTXgndr The mediating variable(s) = education

                                                            Total Direct effect(c) of ACT on SATQ = 058 SE = 003 t direct = 1925 with probability = 0

                                                            Direct effect (c) of ACT on SATQ removing education = 059 SE = 003 t direct = 1926 with probability = 0

                                                            Indirect effect (ab) of ACT on SATQ through education = -001

                                                            Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -002 Upper CI = 0

                                                            Total Direct effect(c) of gender on SATQ = -014 SE = 003 t direct = -478 with probability = 21e-06

                                                            Direct effect (c) of gender on NA removing education = -014 SE = 003 t direct = -463 with probability = 44e-06

                                                            Indirect effect (ab) of gender on SATQ through education = 0

                                                            Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -001 Upper CI = 0

                                                            Total Direct effect(c) of ACTXgndr on SATQ = 0 SE = 003 t direct = 002 with probability = 099

                                                            Direct effect (c) of ACTXgndr on NA removing education = 0 SE = 003 t direct = 001 with probability = 099

                                                            Indirect effect (ab) of ACTXgndr on SATQ through education = 0

                                                            Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = 0 Upper CI = 0

                                                            R2 of model = 037

                                                            To see the longer output specify short = FALSE in the print statement

                                                            Full output

                                                            Total effect estimates (c)

                                                            SATQ se t Prob

                                                            ACT 058 003 1925 000e+00

                                                            gender -014 003 -478 210e-06

                                                            ACTXgndr 000 003 002 985e-01

                                                            Direct effect estimates (c)SATQ se t Prob

                                                            ACT 059 003 1926 000e+00

                                                            gender -014 003 -463 437e-06

                                                            ACTXgndr 000 003 001 992e-01

                                                            a effect estimates

                                                            education se t Prob

                                                            ACT 016 004 422 277e-05

                                                            gender 009 004 250 128e-02

                                                            ACTXgndr -001 004 -015 883e-01

                                                            b effect estimates

                                                            SATQ se t Prob

                                                            education -004 003 -145 0147

                                                            ab effect estimates

                                                            SATQ boot sd lower upper

                                                            ACT -001 -001 001 0 0

                                                            gender 000 000 000 0 0

                                                            ACTXgndr 000 000 000 0 0

                                                            Moderation model

                                                            ACT

                                                            gender

                                                            ACTXgndr

                                                            SATQ

                                                            education016 c = 058

                                                            c = 059

                                                            009 c = minus014

                                                            c = minus014

                                                            minus001 c = 0

                                                            c = 0

                                                            minus004

                                                            minus004

                                                            minus007

                                                            002

                                                            Figure 18 Moderated multiple regression requires the raw data

                                                            45

                                                            Min 1Q Median 3Q Max

                                                            -252458 -32133 07769 35921 92630

                                                            Coefficients

                                                            Estimate Std Error t value Pr(gt|t|)

                                                            (Intercept) 2741706 082140 33378 lt 2e-16

                                                            gender -048606 037984 -1280 020110

                                                            education 047890 015235 3143 000174

                                                            age 001623 002278 0712 047650

                                                            ---

                                                            Signif codes 0 0001 001 005 01 1

                                                            Residual standard error 4768 on 696 degrees of freedom

                                                            Multiple R-squared 00272 Adjusted R-squared 002301

                                                            F-statistic 6487 on 3 and 696 DF p-value 00002476

                                                            Compare this with the output from setCor

                                                            gt compare with sector

                                                            gt setCor(c(46)c(13)C nobs=700)

                                                            Call setCor(y = c(46) x = c(13) data = C nobs = 700)

                                                            Multiple Regression from matrix input

                                                            Beta weights

                                                            ACT SATV SATQ

                                                            gender -005 -003 -018

                                                            education 014 010 010

                                                            age 003 -010 -009

                                                            Multiple R

                                                            ACT SATV SATQ

                                                            016 010 019

                                                            multiple R2

                                                            ACT SATV SATQ

                                                            00272 00096 00359

                                                            Multiple Inflation Factor (VIF) = 1(1-SMC) =

                                                            gender education age

                                                            101 145 144

                                                            Unweighted multiple R

                                                            ACT SATV SATQ

                                                            015 005 011

                                                            Unweighted multiple R2

                                                            ACT SATV SATQ

                                                            002 000 001

                                                            SE of Beta weights

                                                            ACT SATV SATQ

                                                            gender 018 429 434

                                                            education 022 513 518

                                                            age 022 511 516

                                                            t of Beta Weights

                                                            ACT SATV SATQ

                                                            gender -027 -001 -004

                                                            education 065 002 002

                                                            46

                                                            age 015 -002 -002

                                                            Probability of t lt

                                                            ACT SATV SATQ

                                                            gender 079 099 097

                                                            education 051 098 098

                                                            age 088 098 099

                                                            Shrunken R2

                                                            ACT SATV SATQ

                                                            00230 00054 00317

                                                            Standard Error of R2

                                                            ACT SATV SATQ

                                                            00120 00073 00137

                                                            F

                                                            ACT SATV SATQ

                                                            649 226 863

                                                            Probability of F lt

                                                            ACT SATV SATQ

                                                            248e-04 808e-02 124e-05

                                                            degrees of freedom of regression

                                                            [1] 3 696

                                                            Various estimates of between set correlations

                                                            Squared Canonical Correlations

                                                            [1] 0050 0033 0008

                                                            Chisq of canonical correlations

                                                            [1] 358 231 56

                                                            Average squared canonical correlation = 003

                                                            Cohens Set Correlation R2 = 009

                                                            Shrunken Set Correlation R2 = 008

                                                            F and df of Cohens Set Correlation 726 9 168186

                                                            Unweighted correlation between the two sets = 001

                                                            Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

                                                            6 Converting output to APA style tables using LATEX

                                                            Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenient

                                                            47

                                                            LATEXoutput and finally df2latex converts a generic data frame to LATEX

                                                            An example of converting the output from fa to LATEXappears in Table 2

                                                            Table 2 fa2latexA factor analysis table from the psych package in R

                                                            Variable MR1 MR2 MR3 h2 u2 com

                                                            Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

                                                            SS loadings 264 186 15

                                                            MR1 100 059 054MR2 059 100 052MR3 054 052 100

                                                            48

                                                            7 Miscellaneous functions

                                                            A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

                                                            blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

                                                            df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

                                                            scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

                                                            cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

                                                            cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

                                                            dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

                                                            fisherz Convert a correlation to the corresponding Fisher z score

                                                            geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

                                                            ICC and cohenkappa are typically used to find the reliability for raters

                                                            headtail combines the head and tail functions to show the first and last lines of a dataset or output

                                                            topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

                                                            mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

                                                            prep finds the probability of replication for an F t or r and estimate effect size

                                                            partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

                                                            rangeCorrection will correct correlations for restriction of range

                                                            reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

                                                            49

                                                            superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

                                                            8 Data sets

                                                            A number of data sets for demonstrating psychometric techniques are included in thepsych package These include six data sets showing a hierarchical factor structure (fivecognitive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2and one from health psychology Reise) One of these (Thurstone) is used as an examplein the sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) or 13 personality inventory scores (epibfi) and14 multiple choice iq items (iqitems) The vegetables example has paired comparisonpreferences for 9 vegetables This is an example of Thurstonian scaling used by Guilford(1954) and Nunnally (1967) Other data sets include cubits peas and heights fromGalton

                                                            Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

                                                            bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

                                                            satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

                                                            epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

                                                            50

                                                            iq 14 multiple choice ability items were included as part of the Synthetic Aperture Person-ality Assessment (SAPA) web based personality assessment project The data from1000 subjects are included here as a demonstration set for scoring multiple choiceinventories and doing basic item statistics

                                                            galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to ntroduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

                                                            Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

                                                            miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

                                                            9 Development version and a users guide

                                                            The most recent development version is available as a source file at the repository main-tained at httppersonality-projectorgr That version will have removed the mostrecently discovered bugs (but perhaps introduced other yet to be discovered ones) Todownload that version go to the repository httppersonality-projectorgrsrc

                                                            contrib and wander around For a Mac this version can be installed directly using theldquoother repositoryrdquo option in the package installer For a PC the zip file for the most recentrelease has been created using the win-builder facility at CRAN The development releasefor the Mac is usually several weeks ahead of the PC development version

                                                            Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httppersonality-projectorgrpsych_manualpdf

                                                            News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

                                                            gt news(Version gt 170package=psych)

                                                            51

                                                            10 Psychometric Theory

                                                            The psych package has been developed to help psychologists do basic research Many ofthe functions were developed to supplement a book (httppersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep)More information about the use of some of the functions may be found in the book

                                                            For more extensive discussion of the use of psych in particular and R in general consulthttppersonality-projectorgrrguidehtml A short guide to R

                                                            11 SessionInfo

                                                            This document was prepared using the following settings

                                                            gt sessionInfo()

                                                            R Under development (unstable) (2017-03-05 r72309)

                                                            Platform x86_64-apple-darwin1340 (64-bit)

                                                            Running under macOS Sierra 10124

                                                            Matrix products default

                                                            BLAS LibraryFrameworksRframeworkVersions34ResourcesliblibRblas0dylib

                                                            LAPACK LibraryFrameworksRframeworkVersions34ResourcesliblibRlapackdylib

                                                            locale

                                                            [1] C

                                                            attached base packages

                                                            [1] stats graphics grDevices utils datasets methods base

                                                            other attached packages

                                                            [1] psych_17421

                                                            loaded via a namespace (and not attached)

                                                            [1] compiler_340 parallel_340 tools_340 foreign_08-67

                                                            [5] KernSmooth_223-15 nlme_31-131 mnormt_15-4 grid_340

                                                            [9] lattice_020-34

                                                            52

                                                            References

                                                            Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

                                                            Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

                                                            Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

                                                            Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

                                                            Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

                                                            Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

                                                            Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

                                                            Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

                                                            Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

                                                            Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

                                                            Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

                                                            Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

                                                            Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

                                                            Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

                                                            Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

                                                            53

                                                            Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

                                                            Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

                                                            Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

                                                            Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

                                                            Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

                                                            Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

                                                            Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

                                                            Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

                                                            Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

                                                            Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

                                                            MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

                                                            Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

                                                            McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

                                                            Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

                                                            Nunnally J C (1967) Psychometric theory McGraw-Hill New York

                                                            54

                                                            Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

                                                            3rd edition

                                                            Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

                                                            Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

                                                            Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

                                                            Revelle W (2015) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 158

                                                            Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

                                                            Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

                                                            Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

                                                            Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

                                                            Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook ofIndividual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

                                                            Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

                                                            Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

                                                            Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

                                                            Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bang

                                                            55

                                                            for the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

                                                            Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

                                                            Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

                                                            Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

                                                            Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

                                                            Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

                                                            Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

                                                            Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

                                                            Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

                                                            Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

                                                            Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

                                                            Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

                                                            56

                                                            Index

                                                            affect 14 24alpha 5 6

                                                            Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26bifactor 6biserial 13 34blockrandom 49burt 34

                                                            char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49circular statistics 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49ctv 7cubits 50

                                                            densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49diagram 8drawcor 34drawtetra 34dummycode 13

                                                            dynamite plot 19

                                                            edit 3epibfi 50error bars 19errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23

                                                            fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factor analysis 6factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49

                                                            galton 51generalized least squares 6geometricmean 49GPArotation 7guttman 6

                                                            harmonicmean 49head 49headtail 49heights 50hetdiagram 7Hmisc 28Holzinger 50

                                                            57

                                                            ICC 6 49iclust 6iclustdiagram 7Index 49introduction to psychometric theory with ap-

                                                            plications in R 7iqitems 50irtfa 6 47irt2latex 47 49

                                                            KnitR 47

                                                            lavaan 38library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27lowess 14

                                                            makekeys 14MAP 6mardia 49maximum likelihood 6mediate 4 41 42mediatediagram 41minimum residual 6mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevel 37multilevelreliability 6multiple regression 38

                                                            nfactors 6nlme 37

                                                            omega 6 7outlier 3 11 12

                                                            padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7principal axis 6psych 3 5ndash8 28 47 49ndash52

                                                            R functionaffect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50

                                                            58

                                                            densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13edit 3epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49head 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47

                                                            irt2latex 47 49library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7psych 51psych package

                                                            affect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26

                                                            59

                                                            biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49corsmooth 34cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7

                                                            fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47irt2latex 47 49lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12prep 49pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34

                                                            60

                                                            polyserial 34principal 5ndash7psych 51rtest 28rangeCorrection 49readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

                                                            rtest 28

                                                            rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50table 19tail 49tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

                                                            R package

                                                            61

                                                            ctv 7GPArotation 7Hmisc 28KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash8 28 47 49ndash52Rgraphviz 8sem 7 50stats 28Sweave 47xtable 47

                                                            rtest 28rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49Rgraphviz 8

                                                            SAPA 26 50 51satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43sem 7 50set correlation 44setcor 49setCor 4 38 41 44 46 47simmultilevel 37

                                                            spider 13stars 14stats 28StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50Sweave 47

                                                            table 19tail 49tetrachoric 6 34Thurstone 28 38 50Thurstone33 50topBottom 49

                                                            vegetables 50 51violinBy 14 18vss 5 6

                                                            weighted least squares 6withinBetween 37

                                                            xtable 47

                                                            62

                                                            • Jump starting the psych packagendasha guide for the impatient
                                                            • Psychometric functions are summarized in the second vignette
                                                            • Overview of this and related documents
                                                            • Getting started
                                                            • Basic data analysis
                                                              • Getting the data by using readfile
                                                              • Data input from the clipboard
                                                              • Basic descriptive statistics
                                                                • Outlier detection using outlier
                                                                • Basic data cleaning using scrub
                                                                • Recoding categorical variables into dummy coded variables
                                                                  • Simple descriptive graphics
                                                                    • Scatter Plot Matrices
                                                                    • Density or violin plots
                                                                    • Means and error bars
                                                                    • Error bars for tabular data
                                                                    • Two dimensional displays of means and errors
                                                                    • Back to back histograms
                                                                    • Correlational structure
                                                                    • Heatmap displays of correlational structure
                                                                      • Testing correlations
                                                                      • Polychoric tetrachoric polyserial and biserial correlations
                                                                        • Multilevel modeling
                                                                          • Decomposing data into within and between level correlations using statsBy
                                                                          • Generating and displaying multilevel data
                                                                          • Factor analysis by groups
                                                                            • Multiple Regression mediation moderation and set correlations
                                                                              • Multiple regression from data or correlation matrices
                                                                              • Mediation and Moderation analysis
                                                                              • Set Correlation
                                                                                • Converting output to APA style tables using LaTeX
                                                                                • Miscellaneous functions
                                                                                • Data sets
                                                                                • Development version and a users guide
                                                                                • Psychometric Theory
                                                                                • SessionInfo

                                                              gt png(spiderpng)gt oplt- par(mfrow=c(22))

                                                              gt spider(y=c(161218)x=124data=rcircfill=TRUEmain=Spider plot of 24 circumplex variables)

                                                              gt op lt- par(mfrow=c(11))

                                                              gt devoff()

                                                              null device

                                                              1

                                                              Figure 13 A spider plot can show circumplex structure very clearly Circumplex structuresare common in the study of affect

                                                              31

                                                              Table 1 The corrtest function reports correlations cell sizes and raw and adjustedprobability values corrp reports the probability values for a correlation matrix Bydefault the adjustment used is that of Holm (1979)gt corrtest(satact)

                                                              Callcorrtest(x = satact)

                                                              Correlation matrix

                                                              gender education age ACT SATV SATQ

                                                              gender 100 009 -002 -004 -002 -017

                                                              education 009 100 055 015 005 003

                                                              age -002 055 100 011 -004 -003

                                                              ACT -004 015 011 100 056 059

                                                              SATV -002 005 -004 056 100 064

                                                              SATQ -017 003 -003 059 064 100

                                                              Sample Size

                                                              gender education age ACT SATV SATQ

                                                              gender 700 700 700 700 700 687

                                                              education 700 700 700 700 700 687

                                                              age 700 700 700 700 700 687

                                                              ACT 700 700 700 700 700 687

                                                              SATV 700 700 700 700 700 687

                                                              SATQ 687 687 687 687 687 687

                                                              Probability values (Entries above the diagonal are adjusted for multiple tests)

                                                              gender education age ACT SATV SATQ

                                                              gender 000 017 100 100 1 0

                                                              education 002 000 000 000 1 1

                                                              age 058 000 000 003 1 1

                                                              ACT 033 000 000 000 0 0

                                                              SATV 062 022 026 000 0 0

                                                              SATQ 000 036 037 000 0 0

                                                              To see confidence intervals of the correlations print with the short=FALSE option

                                                              32

                                                              depending upon the input

                                                              1) For a sample size n find the t and p value for a single correlation as well as the confidenceinterval

                                                              gt rtest(503)

                                                              Correlation tests

                                                              Callrtest(n = 50 r12 = 03)

                                                              Test of significance of a correlation

                                                              t value 218 with probability lt 0034

                                                              and confidence interval 002 053

                                                              2) For sample sizes of n and n2 (n2 = n if not specified) find the z of the difference betweenthe z transformed correlations divided by the standard error of the difference of two zscores

                                                              gt rtest(3046)

                                                              Correlation tests

                                                              Callrtest(n = 30 r12 = 04 r34 = 06)

                                                              Test of difference between two independent correlations

                                                              z value 099 with probability 032

                                                              3) For sample size n and correlations ra= r12 rb= r23 and r13 specified test for thedifference of two dependent correlations (Steiger case A)

                                                              gt rtest(103451)

                                                              Correlation tests

                                                              Call[1] rtest(n = 103 r12 = 04 r23 = 01 r13 = 05 )

                                                              Test of difference between two correlated correlations

                                                              t value -089 with probability lt 037

                                                              4) For sample size n test for the difference between two dependent correlations involvingdifferent variables (Steiger case B)

                                                              gt rtest(103567558) steiger Case B

                                                              Correlation tests

                                                              Callrtest(n = 103 r12 = 05 r34 = 06 r23 = 07 r13 = 05 r14 = 05

                                                              r24 = 08)

                                                              Test of difference between two dependent correlations

                                                              z value -12 with probability 023

                                                              To test whether a matrix of correlations differs from what would be expected if the popu-lation correlations were all zero the function cortest follows Steiger (1980) who pointedout that the sum of the squared elements of a correlation matrix or the Fisher z scoreequivalents is distributed as chi square under the null hypothesis that the values are zero(ie elements of the identity matrix) This is particularly useful for examining whethercorrelations in a single matrix differ from zero or for comparing two matrices Althoughobvious cortest can be used to test whether the satact data matrix produces non-zerocorrelations (it does) This is a much more appropriate test when testing whether a residualmatrix differs from zero

                                                              gt cortest(satact)

                                                              33

                                                              Tests of correlation matrices

                                                              Callcortest(R1 = satact)

                                                              Chi Square value 132542 with df = 15 with probability lt 18e-273

                                                              36 Polychoric tetrachoric polyserial and biserial correlations

                                                              The Pearson correlation of dichotomous data is also known as the φ coefficient If thedata eg ability items are thought to represent an underlying continuous although latentvariable the φ will underestimate the value of the Pearson applied to these latent variablesOne solution to this problem is to use the tetrachoric correlation which is based uponthe assumption of a bivariate normal distribution that has been cut at certain points Thedrawtetra function demonstrates the process (Figure 14) This is also shown in termsof dichotomizing the bivariate normal density function using the drawcor function (Fig-ure 15) A simple generalization of this to the case of the multiple cuts is the polychoric

                                                              correlation

                                                              Other estimated correlations based upon the assumption of bivariate normality with cutpoints include the biserial and polyserial correlation

                                                              If the data are a mix of continuous polytomous and dichotomous variables the mixedcor

                                                              function will calculate the appropriate mixture of Pearson polychoric tetrachoric biserialand polyserial correlations

                                                              The correlation matrix resulting from a number of tetrachoric or polychoric correlationmatrix sometimes will not be positive semi-definite This will sometimes happen if thecorrelation matrix is formed by using pair-wise deletion of cases The corsmooth functionwill adjust the smallest eigen values of the correlation matrix to make them positive rescaleall of them to sum to the number of variables and produce aldquosmoothedrdquocorrelation matrixAn example of this problem is a data set of burt which probably had a typo in the originalcorrelation matrix Smoothing the matrix corrects this problem

                                                              4 Multilevel modeling

                                                              Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular use

                                                              34

                                                              gt drawtetra()

                                                              minus3 minus2 minus1 0 1 2 3

                                                              minus3

                                                              minus2

                                                              minus1

                                                              01

                                                              23

                                                              Y rho = 05phi = 033

                                                              X gt τY gt Τ

                                                              X lt τY gt Τ

                                                              X gt τY lt Τ

                                                              X lt τY lt Τ

                                                              x

                                                              dnor

                                                              m(x

                                                              )

                                                              X gt τ

                                                              τ

                                                              x1

                                                              Y gt Τ

                                                              Τ

                                                              Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values

                                                              35

                                                              gt drawcor(expand=20cuts=c(00))

                                                              xy

                                                              z

                                                              Bivariate density rho = 05

                                                              Figure 15 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values It isfound (laboriously) by optimizing the fit of the bivariate normal for various values of thecorrelation to the observed cell frequencies

                                                              36

                                                              is the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups

                                                              41 Decomposing data into within and between level correlations usingstatsBy

                                                              There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler functionto give some of the basic descriptive statistics for two level models

                                                              This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package

                                                              rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)

                                                              where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means

                                                              42 Generating and displaying multilevel data

                                                              withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a betweengroup correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data

                                                              simmultilevel will generate simulated data with a multilevel structure

                                                              The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary

                                                              function specifying the variable of interest

                                                              37

                                                              Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )

                                                              43 Factor analysis by groups

                                                              Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output

                                                              sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)

                                                              faBy(sbnfactors=5) find the 5 factor solution for each education level

                                                              5 Multiple Regression mediation moderation and set cor-relations

                                                              The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices

                                                              51 Multiple regression from data or correlation matrices

                                                              The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstonecorrelation matrix and find the multiple correlation of the last five variables as a functionof the first 4

                                                              gt setCor(y = 59x=14data=Thurstone)

                                                              Call setCor(y = 59 x = 14 data = Thurstone)

                                                              Multiple Regression from matrix input

                                                              Beta weights

                                                              FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

                                                              Sentences 009 007 025 021 020

                                                              Vocabulary 009 017 009 016 -002

                                                              SentCompletion 002 005 004 021 008

                                                              FirstLetters 058 045 021 008 031

                                                              38

                                                              Multiple R

                                                              FourLetterWords Suffixes LetterSeries Pedigrees

                                                              069 063 050 058

                                                              LetterGroup

                                                              048

                                                              multiple R2

                                                              FourLetterWords Suffixes LetterSeries Pedigrees

                                                              048 040 025 034

                                                              LetterGroup

                                                              023

                                                              Multiple Inflation Factor (VIF) = 1(1-SMC) =

                                                              Sentences Vocabulary SentCompletion FirstLetters

                                                              369 388 300 135

                                                              Unweighted multiple R

                                                              FourLetterWords Suffixes LetterSeries Pedigrees

                                                              059 058 049 058

                                                              LetterGroup

                                                              045

                                                              Unweighted multiple R2

                                                              FourLetterWords Suffixes LetterSeries Pedigrees

                                                              034 034 024 033

                                                              LetterGroup

                                                              020

                                                              Various estimates of between set correlations

                                                              Squared Canonical Correlations

                                                              [1] 06280 01478 00076 00049

                                                              Average squared canonical correlation = 02

                                                              Cohens Set Correlation R2 = 069

                                                              Unweighted correlation between the two sets = 073

                                                              By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed

                                                              gt sc lt- setCor(y = 59x=34data=Thurstonez=12)

                                                              Call setCor(y = 59 x = 34 data = Thurstone z = 12)

                                                              Multiple Regression from matrix input

                                                              Beta weights

                                                              FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

                                                              SentCompletion 002 005 004 021 008

                                                              FirstLetters 058 045 021 008 031

                                                              Multiple R

                                                              FourLetterWords Suffixes LetterSeries Pedigrees

                                                              058 046 021 018

                                                              LetterGroup

                                                              030

                                                              39

                                                              multiple R2

                                                              FourLetterWords Suffixes LetterSeries Pedigrees

                                                              0331 0210 0043 0032

                                                              LetterGroup

                                                              0092

                                                              Multiple Inflation Factor (VIF) = 1(1-SMC) =

                                                              SentCompletion FirstLetters

                                                              102 102

                                                              Unweighted multiple R

                                                              FourLetterWords Suffixes LetterSeries Pedigrees

                                                              044 035 017 014

                                                              LetterGroup

                                                              026

                                                              Unweighted multiple R2

                                                              FourLetterWords Suffixes LetterSeries Pedigrees

                                                              019 012 003 002

                                                              LetterGroup

                                                              007

                                                              Various estimates of between set correlations

                                                              Squared Canonical Correlations

                                                              [1] 0405 0023

                                                              Average squared canonical correlation = 021

                                                              Cohens Set Correlation R2 = 042

                                                              Unweighted correlation between the two sets = 048

                                                              gt round(sc$residual2)

                                                              FourLetterWords Suffixes LetterSeries Pedigrees

                                                              FourLetterWords 052 011 009 006

                                                              Suffixes 011 060 -001 001

                                                              LetterSeries 009 -001 075 028

                                                              Pedigrees 006 001 028 066

                                                              LetterGroup 013 003 037 020

                                                              LetterGroup

                                                              FourLetterWords 013

                                                              Suffixes 003

                                                              LetterSeries 037

                                                              Pedigrees 020

                                                              LetterGroup 077

                                                              52 Mediation and Moderation analysis

                                                              Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping

                                                              40

                                                              Consider the example from Preacher and Hayes (2004) as analyzed using the mediate

                                                              function and the subsequent graphic from mediatediagram The data are found in theexample for mediate

                                                              Call mediate(y = SATIS x = THERAPY m = ATTRIB data = sobel)

                                                              The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB

                                                              Total Direct effect(c) of THERAPY on SATIS = 076 SE = 031 t direct = 25 with probability = 0019

                                                              Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t direct = 135 with probability = 019

                                                              Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033

                                                              Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 004 Upper CI = 069

                                                              R2 of model = 031

                                                              To see the longer output specify short = FALSE in the print statement

                                                              Full output

                                                              Total effect estimates (c)

                                                              SATIS se t Prob

                                                              THERAPY 076 031 25 00186

                                                              Direct effect estimates (c)SATIS se t Prob

                                                              THERAPY 043 032 135 0190

                                                              ATTRIB 040 018 223 0034

                                                              a effect estimates

                                                              THERAPY se t Prob

                                                              ATTRIB 082 03 274 00106

                                                              b effect estimates

                                                              SATIS se t Prob

                                                              ATTRIB 04 018 223 0034

                                                              ab effect estimates

                                                              SATIS boot sd lower upper

                                                              THERAPY 033 032 017 004 069

                                                              bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables

                                                              setCor(y = c( SATV SATQ) x = c(education age ) data = satact std=TRUE)

                                                              bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap

                                                              mediate(y = c( SATV ) x = c(education age ) m= ACT data =satactstd=TRUEniter=50)

                                                              bull mediate will take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It then tests the mediation effect using a bootstrap The particular example is for demonstration purposes only and shows neithermoderation nor mediation The number of iterations for the boot strap was set to 50

                                                              41

                                                              gt mediatediagram(preacher)

                                                              Mediation model

                                                              THERAPY SATIS

                                                              ATTRIB

                                                              082

                                                              c = 076

                                                              c = 043

                                                              04

                                                              Figure 16 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram

                                                              42

                                                              gt preacher lt- setCor(1c(23)sobelstd=FALSE)

                                                              gt setCordiagram(preacher)

                                                              Regression Models

                                                              THERAPY

                                                              ATTRIB

                                                              SATIS

                                                              043

                                                              04

                                                              021

                                                              Figure 17 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure

                                                              43

                                                              for speed The default number of boot straps is 5000

                                                              53 Set Correlation

                                                              An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor

                                                              function Set correlation is

                                                              R2 = 1minusn

                                                              prodi=1

                                                              (1minusλi)

                                                              where λi is the ith eigen value of the eigen value decomposition of the matrix

                                                              R = Rminus1xx RxyRminus1

                                                              xx Rminus1xy

                                                              Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate

                                                              setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data

                                                              Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things tonotice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied

                                                              For this example the analysis is done on the correlation matrix rather than the rawdata

                                                              gt C lt- cov(satactuse=pairwise)

                                                              gt model1 lt- lm(ACT~ gender + education + age data=satact)

                                                              gt summary(model1)

                                                              Call

                                                              lm(formula = ACT ~ gender + education + age data = satact)

                                                              Residuals

                                                              44

                                                              Call mediate(y = c(SATQ) x = c(ACT) m = education data = satact

                                                              mod = gender niter = 50 std = TRUE)

                                                              The DV (Y) was SATQ The IV (X) was ACT gender ACTXgndr The mediating variable(s) = education

                                                              Total Direct effect(c) of ACT on SATQ = 058 SE = 003 t direct = 1925 with probability = 0

                                                              Direct effect (c) of ACT on SATQ removing education = 059 SE = 003 t direct = 1926 with probability = 0

                                                              Indirect effect (ab) of ACT on SATQ through education = -001

                                                              Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -002 Upper CI = 0

                                                              Total Direct effect(c) of gender on SATQ = -014 SE = 003 t direct = -478 with probability = 21e-06

                                                              Direct effect (c) of gender on NA removing education = -014 SE = 003 t direct = -463 with probability = 44e-06

                                                              Indirect effect (ab) of gender on SATQ through education = 0

                                                              Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -001 Upper CI = 0

                                                              Total Direct effect(c) of ACTXgndr on SATQ = 0 SE = 003 t direct = 002 with probability = 099

                                                              Direct effect (c) of ACTXgndr on NA removing education = 0 SE = 003 t direct = 001 with probability = 099

                                                              Indirect effect (ab) of ACTXgndr on SATQ through education = 0

                                                              Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = 0 Upper CI = 0

                                                              R2 of model = 037

                                                              To see the longer output specify short = FALSE in the print statement

                                                              Full output

                                                              Total effect estimates (c)

                                                              SATQ se t Prob

                                                              ACT 058 003 1925 000e+00

                                                              gender -014 003 -478 210e-06

                                                              ACTXgndr 000 003 002 985e-01

                                                              Direct effect estimates (c)SATQ se t Prob

                                                              ACT 059 003 1926 000e+00

                                                              gender -014 003 -463 437e-06

                                                              ACTXgndr 000 003 001 992e-01

                                                              a effect estimates

                                                              education se t Prob

                                                              ACT 016 004 422 277e-05

                                                              gender 009 004 250 128e-02

                                                              ACTXgndr -001 004 -015 883e-01

                                                              b effect estimates

                                                              SATQ se t Prob

                                                              education -004 003 -145 0147

                                                              ab effect estimates

                                                              SATQ boot sd lower upper

                                                              ACT -001 -001 001 0 0

                                                              gender 000 000 000 0 0

                                                              ACTXgndr 000 000 000 0 0

                                                              Moderation model

                                                              ACT

                                                              gender

                                                              ACTXgndr

                                                              SATQ

                                                              education016 c = 058

                                                              c = 059

                                                              009 c = minus014

                                                              c = minus014

                                                              minus001 c = 0

                                                              c = 0

                                                              minus004

                                                              minus004

                                                              minus007

                                                              002

                                                              Figure 18 Moderated multiple regression requires the raw data

                                                              45

                                                              Min 1Q Median 3Q Max

                                                              -252458 -32133 07769 35921 92630

                                                              Coefficients

                                                              Estimate Std Error t value Pr(gt|t|)

                                                              (Intercept) 2741706 082140 33378 lt 2e-16

                                                              gender -048606 037984 -1280 020110

                                                              education 047890 015235 3143 000174

                                                              age 001623 002278 0712 047650

                                                              ---

                                                              Signif codes 0 0001 001 005 01 1

                                                              Residual standard error 4768 on 696 degrees of freedom

                                                              Multiple R-squared 00272 Adjusted R-squared 002301

                                                              F-statistic 6487 on 3 and 696 DF p-value 00002476

                                                              Compare this with the output from setCor

                                                              gt compare with sector

                                                              gt setCor(c(46)c(13)C nobs=700)

                                                              Call setCor(y = c(46) x = c(13) data = C nobs = 700)

                                                              Multiple Regression from matrix input

                                                              Beta weights

                                                              ACT SATV SATQ

                                                              gender -005 -003 -018

                                                              education 014 010 010

                                                              age 003 -010 -009

                                                              Multiple R

                                                              ACT SATV SATQ

                                                              016 010 019

                                                              multiple R2

                                                              ACT SATV SATQ

                                                              00272 00096 00359

                                                              Multiple Inflation Factor (VIF) = 1(1-SMC) =

                                                              gender education age

                                                              101 145 144

                                                              Unweighted multiple R

                                                              ACT SATV SATQ

                                                              015 005 011

                                                              Unweighted multiple R2

                                                              ACT SATV SATQ

                                                              002 000 001

                                                              SE of Beta weights

                                                              ACT SATV SATQ

                                                              gender 018 429 434

                                                              education 022 513 518

                                                              age 022 511 516

                                                              t of Beta Weights

                                                              ACT SATV SATQ

                                                              gender -027 -001 -004

                                                              education 065 002 002

                                                              46

                                                              age 015 -002 -002

                                                              Probability of t lt

                                                              ACT SATV SATQ

                                                              gender 079 099 097

                                                              education 051 098 098

                                                              age 088 098 099

                                                              Shrunken R2

                                                              ACT SATV SATQ

                                                              00230 00054 00317

                                                              Standard Error of R2

                                                              ACT SATV SATQ

                                                              00120 00073 00137

                                                              F

                                                              ACT SATV SATQ

                                                              649 226 863

                                                              Probability of F lt

                                                              ACT SATV SATQ

                                                              248e-04 808e-02 124e-05

                                                              degrees of freedom of regression

                                                              [1] 3 696

                                                              Various estimates of between set correlations

                                                              Squared Canonical Correlations

                                                              [1] 0050 0033 0008

                                                              Chisq of canonical correlations

                                                              [1] 358 231 56

                                                              Average squared canonical correlation = 003

                                                              Cohens Set Correlation R2 = 009

                                                              Shrunken Set Correlation R2 = 008

                                                              F and df of Cohens Set Correlation 726 9 168186

                                                              Unweighted correlation between the two sets = 001

                                                              Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

                                                              6 Converting output to APA style tables using LATEX

                                                              Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenient

                                                              47

                                                              LATEXoutput and finally df2latex converts a generic data frame to LATEX

                                                              An example of converting the output from fa to LATEXappears in Table 2

                                                              Table 2 fa2latexA factor analysis table from the psych package in R

                                                              Variable MR1 MR2 MR3 h2 u2 com

                                                              Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

                                                              SS loadings 264 186 15

                                                              MR1 100 059 054MR2 059 100 052MR3 054 052 100

                                                              48

                                                              7 Miscellaneous functions

                                                              A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

                                                              blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

                                                              df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

                                                              scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

                                                              cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

                                                              cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

                                                              dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

                                                              fisherz Convert a correlation to the corresponding Fisher z score

                                                              geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

                                                              ICC and cohenkappa are typically used to find the reliability for raters

                                                              headtail combines the head and tail functions to show the first and last lines of a dataset or output

                                                              topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

                                                              mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

                                                              prep finds the probability of replication for an F t or r and estimate effect size

                                                              partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

                                                              rangeCorrection will correct correlations for restriction of range

                                                              reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

                                                              49

                                                              superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

                                                              8 Data sets

                                                              A number of data sets for demonstrating psychometric techniques are included in thepsych package These include six data sets showing a hierarchical factor structure (fivecognitive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2and one from health psychology Reise) One of these (Thurstone) is used as an examplein the sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) or 13 personality inventory scores (epibfi) and14 multiple choice iq items (iqitems) The vegetables example has paired comparisonpreferences for 9 vegetables This is an example of Thurstonian scaling used by Guilford(1954) and Nunnally (1967) Other data sets include cubits peas and heights fromGalton

                                                              Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

                                                              bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

                                                              satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

                                                              epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

                                                              50

                                                              iq 14 multiple choice ability items were included as part of the Synthetic Aperture Person-ality Assessment (SAPA) web based personality assessment project The data from1000 subjects are included here as a demonstration set for scoring multiple choiceinventories and doing basic item statistics

                                                              galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to ntroduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

                                                              Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

                                                              miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

                                                              9 Development version and a users guide

                                                              The most recent development version is available as a source file at the repository main-tained at httppersonality-projectorgr That version will have removed the mostrecently discovered bugs (but perhaps introduced other yet to be discovered ones) Todownload that version go to the repository httppersonality-projectorgrsrc

                                                              contrib and wander around For a Mac this version can be installed directly using theldquoother repositoryrdquo option in the package installer For a PC the zip file for the most recentrelease has been created using the win-builder facility at CRAN The development releasefor the Mac is usually several weeks ahead of the PC development version

                                                              Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httppersonality-projectorgrpsych_manualpdf

                                                              News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

                                                              gt news(Version gt 170package=psych)

                                                              51

                                                              10 Psychometric Theory

                                                              The psych package has been developed to help psychologists do basic research Many ofthe functions were developed to supplement a book (httppersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep)More information about the use of some of the functions may be found in the book

                                                              For more extensive discussion of the use of psych in particular and R in general consulthttppersonality-projectorgrrguidehtml A short guide to R

                                                              11 SessionInfo

                                                              This document was prepared using the following settings

                                                              gt sessionInfo()

                                                              R Under development (unstable) (2017-03-05 r72309)

                                                              Platform x86_64-apple-darwin1340 (64-bit)

                                                              Running under macOS Sierra 10124

                                                              Matrix products default

                                                              BLAS LibraryFrameworksRframeworkVersions34ResourcesliblibRblas0dylib

                                                              LAPACK LibraryFrameworksRframeworkVersions34ResourcesliblibRlapackdylib

                                                              locale

                                                              [1] C

                                                              attached base packages

                                                              [1] stats graphics grDevices utils datasets methods base

                                                              other attached packages

                                                              [1] psych_17421

                                                              loaded via a namespace (and not attached)

                                                              [1] compiler_340 parallel_340 tools_340 foreign_08-67

                                                              [5] KernSmooth_223-15 nlme_31-131 mnormt_15-4 grid_340

                                                              [9] lattice_020-34

                                                              52

                                                              References

                                                              Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

                                                              Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

                                                              Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

                                                              Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

                                                              Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

                                                              Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

                                                              Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

                                                              Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

                                                              Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

                                                              Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

                                                              Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

                                                              Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

                                                              Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

                                                              Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

                                                              Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

                                                              53

                                                              Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

                                                              Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

                                                              Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

                                                              Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

                                                              Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

                                                              Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

                                                              Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

                                                              Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

                                                              Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

                                                              Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

                                                              MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

                                                              Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

                                                              McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

                                                              Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

                                                              Nunnally J C (1967) Psychometric theory McGraw-Hill New York

                                                              54

                                                              Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

                                                              3rd edition

                                                              Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

                                                              Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

                                                              Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

                                                              Revelle W (2015) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 158

                                                              Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

                                                              Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

                                                              Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

                                                              Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

                                                              Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook ofIndividual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

                                                              Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

                                                              Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

                                                              Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

                                                              Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bang

                                                              55

                                                              for the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

                                                              Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

                                                              Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

                                                              Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

                                                              Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

                                                              Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

                                                              Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

                                                              Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

                                                              Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

                                                              Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

                                                              Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

                                                              Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

                                                              56

                                                              Index

                                                              affect 14 24alpha 5 6

                                                              Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26bifactor 6biserial 13 34blockrandom 49burt 34

                                                              char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49circular statistics 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49ctv 7cubits 50

                                                              densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49diagram 8drawcor 34drawtetra 34dummycode 13

                                                              dynamite plot 19

                                                              edit 3epibfi 50error bars 19errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23

                                                              fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factor analysis 6factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49

                                                              galton 51generalized least squares 6geometricmean 49GPArotation 7guttman 6

                                                              harmonicmean 49head 49headtail 49heights 50hetdiagram 7Hmisc 28Holzinger 50

                                                              57

                                                              ICC 6 49iclust 6iclustdiagram 7Index 49introduction to psychometric theory with ap-

                                                              plications in R 7iqitems 50irtfa 6 47irt2latex 47 49

                                                              KnitR 47

                                                              lavaan 38library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27lowess 14

                                                              makekeys 14MAP 6mardia 49maximum likelihood 6mediate 4 41 42mediatediagram 41minimum residual 6mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevel 37multilevelreliability 6multiple regression 38

                                                              nfactors 6nlme 37

                                                              omega 6 7outlier 3 11 12

                                                              padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7principal axis 6psych 3 5ndash8 28 47 49ndash52

                                                              R functionaffect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50

                                                              58

                                                              densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13edit 3epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49head 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47

                                                              irt2latex 47 49library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7psych 51psych package

                                                              affect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26

                                                              59

                                                              biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49corsmooth 34cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7

                                                              fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47irt2latex 47 49lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12prep 49pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34

                                                              60

                                                              polyserial 34principal 5ndash7psych 51rtest 28rangeCorrection 49readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

                                                              rtest 28

                                                              rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50table 19tail 49tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

                                                              R package

                                                              61

                                                              ctv 7GPArotation 7Hmisc 28KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash8 28 47 49ndash52Rgraphviz 8sem 7 50stats 28Sweave 47xtable 47

                                                              rtest 28rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49Rgraphviz 8

                                                              SAPA 26 50 51satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43sem 7 50set correlation 44setcor 49setCor 4 38 41 44 46 47simmultilevel 37

                                                              spider 13stars 14stats 28StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50Sweave 47

                                                              table 19tail 49tetrachoric 6 34Thurstone 28 38 50Thurstone33 50topBottom 49

                                                              vegetables 50 51violinBy 14 18vss 5 6

                                                              weighted least squares 6withinBetween 37

                                                              xtable 47

                                                              62

                                                              • Jump starting the psych packagendasha guide for the impatient
                                                              • Psychometric functions are summarized in the second vignette
                                                              • Overview of this and related documents
                                                              • Getting started
                                                              • Basic data analysis
                                                                • Getting the data by using readfile
                                                                • Data input from the clipboard
                                                                • Basic descriptive statistics
                                                                  • Outlier detection using outlier
                                                                  • Basic data cleaning using scrub
                                                                  • Recoding categorical variables into dummy coded variables
                                                                    • Simple descriptive graphics
                                                                      • Scatter Plot Matrices
                                                                      • Density or violin plots
                                                                      • Means and error bars
                                                                      • Error bars for tabular data
                                                                      • Two dimensional displays of means and errors
                                                                      • Back to back histograms
                                                                      • Correlational structure
                                                                      • Heatmap displays of correlational structure
                                                                        • Testing correlations
                                                                        • Polychoric tetrachoric polyserial and biserial correlations
                                                                          • Multilevel modeling
                                                                            • Decomposing data into within and between level correlations using statsBy
                                                                            • Generating and displaying multilevel data
                                                                            • Factor analysis by groups
                                                                              • Multiple Regression mediation moderation and set correlations
                                                                                • Multiple regression from data or correlation matrices
                                                                                • Mediation and Moderation analysis
                                                                                • Set Correlation
                                                                                  • Converting output to APA style tables using LaTeX
                                                                                  • Miscellaneous functions
                                                                                  • Data sets
                                                                                  • Development version and a users guide
                                                                                  • Psychometric Theory
                                                                                  • SessionInfo

                                                                Table 1 The corrtest function reports correlations cell sizes and raw and adjustedprobability values corrp reports the probability values for a correlation matrix Bydefault the adjustment used is that of Holm (1979)gt corrtest(satact)

                                                                Callcorrtest(x = satact)

                                                                Correlation matrix

                                                                gender education age ACT SATV SATQ

                                                                gender 100 009 -002 -004 -002 -017

                                                                education 009 100 055 015 005 003

                                                                age -002 055 100 011 -004 -003

                                                                ACT -004 015 011 100 056 059

                                                                SATV -002 005 -004 056 100 064

                                                                SATQ -017 003 -003 059 064 100

                                                                Sample Size

                                                                gender education age ACT SATV SATQ

                                                                gender 700 700 700 700 700 687

                                                                education 700 700 700 700 700 687

                                                                age 700 700 700 700 700 687

                                                                ACT 700 700 700 700 700 687

                                                                SATV 700 700 700 700 700 687

                                                                SATQ 687 687 687 687 687 687

                                                                Probability values (Entries above the diagonal are adjusted for multiple tests)

                                                                gender education age ACT SATV SATQ

                                                                gender 000 017 100 100 1 0

                                                                education 002 000 000 000 1 1

                                                                age 058 000 000 003 1 1

                                                                ACT 033 000 000 000 0 0

                                                                SATV 062 022 026 000 0 0

                                                                SATQ 000 036 037 000 0 0

                                                                To see confidence intervals of the correlations print with the short=FALSE option

                                                                32

                                                                depending upon the input

                                                                1) For a sample size n find the t and p value for a single correlation as well as the confidenceinterval

                                                                gt rtest(503)

                                                                Correlation tests

                                                                Callrtest(n = 50 r12 = 03)

                                                                Test of significance of a correlation

                                                                t value 218 with probability lt 0034

                                                                and confidence interval 002 053

                                                                2) For sample sizes of n and n2 (n2 = n if not specified) find the z of the difference betweenthe z transformed correlations divided by the standard error of the difference of two zscores

                                                                gt rtest(3046)

                                                                Correlation tests

                                                                Callrtest(n = 30 r12 = 04 r34 = 06)

                                                                Test of difference between two independent correlations

                                                                z value 099 with probability 032

                                                                3) For sample size n and correlations ra= r12 rb= r23 and r13 specified test for thedifference of two dependent correlations (Steiger case A)

                                                                gt rtest(103451)

                                                                Correlation tests

                                                                Call[1] rtest(n = 103 r12 = 04 r23 = 01 r13 = 05 )

                                                                Test of difference between two correlated correlations

                                                                t value -089 with probability lt 037

                                                                4) For sample size n test for the difference between two dependent correlations involvingdifferent variables (Steiger case B)

                                                                gt rtest(103567558) steiger Case B

                                                                Correlation tests

                                                                Callrtest(n = 103 r12 = 05 r34 = 06 r23 = 07 r13 = 05 r14 = 05

                                                                r24 = 08)

                                                                Test of difference between two dependent correlations

                                                                z value -12 with probability 023

                                                                To test whether a matrix of correlations differs from what would be expected if the popu-lation correlations were all zero the function cortest follows Steiger (1980) who pointedout that the sum of the squared elements of a correlation matrix or the Fisher z scoreequivalents is distributed as chi square under the null hypothesis that the values are zero(ie elements of the identity matrix) This is particularly useful for examining whethercorrelations in a single matrix differ from zero or for comparing two matrices Althoughobvious cortest can be used to test whether the satact data matrix produces non-zerocorrelations (it does) This is a much more appropriate test when testing whether a residualmatrix differs from zero

                                                                gt cortest(satact)

                                                                33

                                                                Tests of correlation matrices

                                                                Callcortest(R1 = satact)

                                                                Chi Square value 132542 with df = 15 with probability lt 18e-273

                                                                36 Polychoric tetrachoric polyserial and biserial correlations

                                                                The Pearson correlation of dichotomous data is also known as the φ coefficient If thedata eg ability items are thought to represent an underlying continuous although latentvariable the φ will underestimate the value of the Pearson applied to these latent variablesOne solution to this problem is to use the tetrachoric correlation which is based uponthe assumption of a bivariate normal distribution that has been cut at certain points Thedrawtetra function demonstrates the process (Figure 14) This is also shown in termsof dichotomizing the bivariate normal density function using the drawcor function (Fig-ure 15) A simple generalization of this to the case of the multiple cuts is the polychoric

                                                                correlation

                                                                Other estimated correlations based upon the assumption of bivariate normality with cutpoints include the biserial and polyserial correlation

                                                                If the data are a mix of continuous polytomous and dichotomous variables the mixedcor

                                                                function will calculate the appropriate mixture of Pearson polychoric tetrachoric biserialand polyserial correlations

                                                                The correlation matrix resulting from a number of tetrachoric or polychoric correlationmatrix sometimes will not be positive semi-definite This will sometimes happen if thecorrelation matrix is formed by using pair-wise deletion of cases The corsmooth functionwill adjust the smallest eigen values of the correlation matrix to make them positive rescaleall of them to sum to the number of variables and produce aldquosmoothedrdquocorrelation matrixAn example of this problem is a data set of burt which probably had a typo in the originalcorrelation matrix Smoothing the matrix corrects this problem

                                                                4 Multilevel modeling

                                                                Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular use

                                                                34

                                                                gt drawtetra()

                                                                minus3 minus2 minus1 0 1 2 3

                                                                minus3

                                                                minus2

                                                                minus1

                                                                01

                                                                23

                                                                Y rho = 05phi = 033

                                                                X gt τY gt Τ

                                                                X lt τY gt Τ

                                                                X gt τY lt Τ

                                                                X lt τY lt Τ

                                                                x

                                                                dnor

                                                                m(x

                                                                )

                                                                X gt τ

                                                                τ

                                                                x1

                                                                Y gt Τ

                                                                Τ

                                                                Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values

                                                                35

                                                                gt drawcor(expand=20cuts=c(00))

                                                                xy

                                                                z

                                                                Bivariate density rho = 05

                                                                Figure 15 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values It isfound (laboriously) by optimizing the fit of the bivariate normal for various values of thecorrelation to the observed cell frequencies

                                                                36

                                                                is the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups

                                                                41 Decomposing data into within and between level correlations usingstatsBy

                                                                There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler functionto give some of the basic descriptive statistics for two level models

                                                                This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package

                                                                rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)

                                                                where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means

                                                                42 Generating and displaying multilevel data

                                                                withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a betweengroup correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data

                                                                simmultilevel will generate simulated data with a multilevel structure

                                                                The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary

                                                                function specifying the variable of interest

                                                                37

                                                                Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )

                                                                43 Factor analysis by groups

                                                                Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output

                                                                sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)

                                                                faBy(sbnfactors=5) find the 5 factor solution for each education level

                                                                5 Multiple Regression mediation moderation and set cor-relations

                                                                The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices

                                                                51 Multiple regression from data or correlation matrices

                                                                The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstonecorrelation matrix and find the multiple correlation of the last five variables as a functionof the first 4

                                                                gt setCor(y = 59x=14data=Thurstone)

                                                                Call setCor(y = 59 x = 14 data = Thurstone)

                                                                Multiple Regression from matrix input

                                                                Beta weights

                                                                FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

                                                                Sentences 009 007 025 021 020

                                                                Vocabulary 009 017 009 016 -002

                                                                SentCompletion 002 005 004 021 008

                                                                FirstLetters 058 045 021 008 031

                                                                38

                                                                Multiple R

                                                                FourLetterWords Suffixes LetterSeries Pedigrees

                                                                069 063 050 058

                                                                LetterGroup

                                                                048

                                                                multiple R2

                                                                FourLetterWords Suffixes LetterSeries Pedigrees

                                                                048 040 025 034

                                                                LetterGroup

                                                                023

                                                                Multiple Inflation Factor (VIF) = 1(1-SMC) =

                                                                Sentences Vocabulary SentCompletion FirstLetters

                                                                369 388 300 135

                                                                Unweighted multiple R

                                                                FourLetterWords Suffixes LetterSeries Pedigrees

                                                                059 058 049 058

                                                                LetterGroup

                                                                045

                                                                Unweighted multiple R2

                                                                FourLetterWords Suffixes LetterSeries Pedigrees

                                                                034 034 024 033

                                                                LetterGroup

                                                                020

                                                                Various estimates of between set correlations

                                                                Squared Canonical Correlations

                                                                [1] 06280 01478 00076 00049

                                                                Average squared canonical correlation = 02

                                                                Cohens Set Correlation R2 = 069

                                                                Unweighted correlation between the two sets = 073

                                                                By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed

                                                                gt sc lt- setCor(y = 59x=34data=Thurstonez=12)

                                                                Call setCor(y = 59 x = 34 data = Thurstone z = 12)

                                                                Multiple Regression from matrix input

                                                                Beta weights

                                                                FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

                                                                SentCompletion 002 005 004 021 008

                                                                FirstLetters 058 045 021 008 031

                                                                Multiple R

                                                                FourLetterWords Suffixes LetterSeries Pedigrees

                                                                058 046 021 018

                                                                LetterGroup

                                                                030

                                                                39

                                                                multiple R2

                                                                FourLetterWords Suffixes LetterSeries Pedigrees

                                                                0331 0210 0043 0032

                                                                LetterGroup

                                                                0092

                                                                Multiple Inflation Factor (VIF) = 1(1-SMC) =

                                                                SentCompletion FirstLetters

                                                                102 102

                                                                Unweighted multiple R

                                                                FourLetterWords Suffixes LetterSeries Pedigrees

                                                                044 035 017 014

                                                                LetterGroup

                                                                026

                                                                Unweighted multiple R2

                                                                FourLetterWords Suffixes LetterSeries Pedigrees

                                                                019 012 003 002

                                                                LetterGroup

                                                                007

                                                                Various estimates of between set correlations

                                                                Squared Canonical Correlations

                                                                [1] 0405 0023

                                                                Average squared canonical correlation = 021

                                                                Cohens Set Correlation R2 = 042

                                                                Unweighted correlation between the two sets = 048

                                                                gt round(sc$residual2)

                                                                FourLetterWords Suffixes LetterSeries Pedigrees

                                                                FourLetterWords 052 011 009 006

                                                                Suffixes 011 060 -001 001

                                                                LetterSeries 009 -001 075 028

                                                                Pedigrees 006 001 028 066

                                                                LetterGroup 013 003 037 020

                                                                LetterGroup

                                                                FourLetterWords 013

                                                                Suffixes 003

                                                                LetterSeries 037

                                                                Pedigrees 020

                                                                LetterGroup 077

                                                                52 Mediation and Moderation analysis

                                                                Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping

                                                                40

                                                                Consider the example from Preacher and Hayes (2004) as analyzed using the mediate

                                                                function and the subsequent graphic from mediatediagram The data are found in theexample for mediate

                                                                Call mediate(y = SATIS x = THERAPY m = ATTRIB data = sobel)

                                                                The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB

                                                                Total Direct effect(c) of THERAPY on SATIS = 076 SE = 031 t direct = 25 with probability = 0019

                                                                Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t direct = 135 with probability = 019

                                                                Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033

                                                                Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 004 Upper CI = 069

                                                                R2 of model = 031

                                                                To see the longer output specify short = FALSE in the print statement

                                                                Full output

                                                                Total effect estimates (c)

                                                                SATIS se t Prob

                                                                THERAPY 076 031 25 00186

                                                                Direct effect estimates (c)SATIS se t Prob

                                                                THERAPY 043 032 135 0190

                                                                ATTRIB 040 018 223 0034

                                                                a effect estimates

                                                                THERAPY se t Prob

                                                                ATTRIB 082 03 274 00106

                                                                b effect estimates

                                                                SATIS se t Prob

                                                                ATTRIB 04 018 223 0034

                                                                ab effect estimates

                                                                SATIS boot sd lower upper

                                                                THERAPY 033 032 017 004 069

                                                                bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables

                                                                setCor(y = c( SATV SATQ) x = c(education age ) data = satact std=TRUE)

                                                                bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap

                                                                mediate(y = c( SATV ) x = c(education age ) m= ACT data =satactstd=TRUEniter=50)

                                                                bull mediate will take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It then tests the mediation effect using a bootstrap The particular example is for demonstration purposes only and shows neithermoderation nor mediation The number of iterations for the boot strap was set to 50

                                                                41

                                                                gt mediatediagram(preacher)

                                                                Mediation model

                                                                THERAPY SATIS

                                                                ATTRIB

                                                                082

                                                                c = 076

                                                                c = 043

                                                                04

                                                                Figure 16 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram

                                                                42

                                                                gt preacher lt- setCor(1c(23)sobelstd=FALSE)

                                                                gt setCordiagram(preacher)

                                                                Regression Models

                                                                THERAPY

                                                                ATTRIB

                                                                SATIS

                                                                043

                                                                04

                                                                021

                                                                Figure 17 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure

                                                                43

                                                                for speed The default number of boot straps is 5000

                                                                53 Set Correlation

                                                                An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor

                                                                function Set correlation is

                                                                R2 = 1minusn

                                                                prodi=1

                                                                (1minusλi)

                                                                where λi is the ith eigen value of the eigen value decomposition of the matrix

                                                                R = Rminus1xx RxyRminus1

                                                                xx Rminus1xy

                                                                Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate

                                                                setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data

                                                                Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things tonotice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied

                                                                For this example the analysis is done on the correlation matrix rather than the rawdata

                                                                gt C lt- cov(satactuse=pairwise)

                                                                gt model1 lt- lm(ACT~ gender + education + age data=satact)

                                                                gt summary(model1)

                                                                Call

                                                                lm(formula = ACT ~ gender + education + age data = satact)

                                                                Residuals

                                                                44

                                                                Call mediate(y = c(SATQ) x = c(ACT) m = education data = satact

                                                                mod = gender niter = 50 std = TRUE)

                                                                The DV (Y) was SATQ The IV (X) was ACT gender ACTXgndr The mediating variable(s) = education

                                                                Total Direct effect(c) of ACT on SATQ = 058 SE = 003 t direct = 1925 with probability = 0

                                                                Direct effect (c) of ACT on SATQ removing education = 059 SE = 003 t direct = 1926 with probability = 0

                                                                Indirect effect (ab) of ACT on SATQ through education = -001

                                                                Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -002 Upper CI = 0

                                                                Total Direct effect(c) of gender on SATQ = -014 SE = 003 t direct = -478 with probability = 21e-06

                                                                Direct effect (c) of gender on NA removing education = -014 SE = 003 t direct = -463 with probability = 44e-06

                                                                Indirect effect (ab) of gender on SATQ through education = 0

                                                                Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -001 Upper CI = 0

                                                                Total Direct effect(c) of ACTXgndr on SATQ = 0 SE = 003 t direct = 002 with probability = 099

                                                                Direct effect (c) of ACTXgndr on NA removing education = 0 SE = 003 t direct = 001 with probability = 099

                                                                Indirect effect (ab) of ACTXgndr on SATQ through education = 0

                                                                Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = 0 Upper CI = 0

                                                                R2 of model = 037

                                                                To see the longer output specify short = FALSE in the print statement

                                                                Full output

                                                                Total effect estimates (c)

                                                                SATQ se t Prob

                                                                ACT 058 003 1925 000e+00

                                                                gender -014 003 -478 210e-06

                                                                ACTXgndr 000 003 002 985e-01

                                                                Direct effect estimates (c)SATQ se t Prob

                                                                ACT 059 003 1926 000e+00

                                                                gender -014 003 -463 437e-06

                                                                ACTXgndr 000 003 001 992e-01

                                                                a effect estimates

                                                                education se t Prob

                                                                ACT 016 004 422 277e-05

                                                                gender 009 004 250 128e-02

                                                                ACTXgndr -001 004 -015 883e-01

                                                                b effect estimates

                                                                SATQ se t Prob

                                                                education -004 003 -145 0147

                                                                ab effect estimates

                                                                SATQ boot sd lower upper

                                                                ACT -001 -001 001 0 0

                                                                gender 000 000 000 0 0

                                                                ACTXgndr 000 000 000 0 0

                                                                Moderation model

                                                                ACT

                                                                gender

                                                                ACTXgndr

                                                                SATQ

                                                                education016 c = 058

                                                                c = 059

                                                                009 c = minus014

                                                                c = minus014

                                                                minus001 c = 0

                                                                c = 0

                                                                minus004

                                                                minus004

                                                                minus007

                                                                002

                                                                Figure 18 Moderated multiple regression requires the raw data

                                                                45

                                                                Min 1Q Median 3Q Max

                                                                -252458 -32133 07769 35921 92630

                                                                Coefficients

                                                                Estimate Std Error t value Pr(gt|t|)

                                                                (Intercept) 2741706 082140 33378 lt 2e-16

                                                                gender -048606 037984 -1280 020110

                                                                education 047890 015235 3143 000174

                                                                age 001623 002278 0712 047650

                                                                ---

                                                                Signif codes 0 0001 001 005 01 1

                                                                Residual standard error 4768 on 696 degrees of freedom

                                                                Multiple R-squared 00272 Adjusted R-squared 002301

                                                                F-statistic 6487 on 3 and 696 DF p-value 00002476

                                                                Compare this with the output from setCor

                                                                gt compare with sector

                                                                gt setCor(c(46)c(13)C nobs=700)

                                                                Call setCor(y = c(46) x = c(13) data = C nobs = 700)

                                                                Multiple Regression from matrix input

                                                                Beta weights

                                                                ACT SATV SATQ

                                                                gender -005 -003 -018

                                                                education 014 010 010

                                                                age 003 -010 -009

                                                                Multiple R

                                                                ACT SATV SATQ

                                                                016 010 019

                                                                multiple R2

                                                                ACT SATV SATQ

                                                                00272 00096 00359

                                                                Multiple Inflation Factor (VIF) = 1(1-SMC) =

                                                                gender education age

                                                                101 145 144

                                                                Unweighted multiple R

                                                                ACT SATV SATQ

                                                                015 005 011

                                                                Unweighted multiple R2

                                                                ACT SATV SATQ

                                                                002 000 001

                                                                SE of Beta weights

                                                                ACT SATV SATQ

                                                                gender 018 429 434

                                                                education 022 513 518

                                                                age 022 511 516

                                                                t of Beta Weights

                                                                ACT SATV SATQ

                                                                gender -027 -001 -004

                                                                education 065 002 002

                                                                46

                                                                age 015 -002 -002

                                                                Probability of t lt

                                                                ACT SATV SATQ

                                                                gender 079 099 097

                                                                education 051 098 098

                                                                age 088 098 099

                                                                Shrunken R2

                                                                ACT SATV SATQ

                                                                00230 00054 00317

                                                                Standard Error of R2

                                                                ACT SATV SATQ

                                                                00120 00073 00137

                                                                F

                                                                ACT SATV SATQ

                                                                649 226 863

                                                                Probability of F lt

                                                                ACT SATV SATQ

                                                                248e-04 808e-02 124e-05

                                                                degrees of freedom of regression

                                                                [1] 3 696

                                                                Various estimates of between set correlations

                                                                Squared Canonical Correlations

                                                                [1] 0050 0033 0008

                                                                Chisq of canonical correlations

                                                                [1] 358 231 56

                                                                Average squared canonical correlation = 003

                                                                Cohens Set Correlation R2 = 009

                                                                Shrunken Set Correlation R2 = 008

                                                                F and df of Cohens Set Correlation 726 9 168186

                                                                Unweighted correlation between the two sets = 001

                                                                Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

                                                                6 Converting output to APA style tables using LATEX

                                                                Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenient

                                                                47

                                                                LATEXoutput and finally df2latex converts a generic data frame to LATEX

                                                                An example of converting the output from fa to LATEXappears in Table 2

                                                                Table 2 fa2latexA factor analysis table from the psych package in R

                                                                Variable MR1 MR2 MR3 h2 u2 com

                                                                Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

                                                                SS loadings 264 186 15

                                                                MR1 100 059 054MR2 059 100 052MR3 054 052 100

                                                                48

                                                                7 Miscellaneous functions

                                                                A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

                                                                blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

                                                                df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

                                                                scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

                                                                cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

                                                                cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

                                                                dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

                                                                fisherz Convert a correlation to the corresponding Fisher z score

                                                                geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

                                                                ICC and cohenkappa are typically used to find the reliability for raters

                                                                headtail combines the head and tail functions to show the first and last lines of a dataset or output

                                                                topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

                                                                mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

                                                                prep finds the probability of replication for an F t or r and estimate effect size

                                                                partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

                                                                rangeCorrection will correct correlations for restriction of range

                                                                reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

                                                                49

                                                                superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

                                                                8 Data sets

                                                                A number of data sets for demonstrating psychometric techniques are included in thepsych package These include six data sets showing a hierarchical factor structure (fivecognitive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2and one from health psychology Reise) One of these (Thurstone) is used as an examplein the sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) or 13 personality inventory scores (epibfi) and14 multiple choice iq items (iqitems) The vegetables example has paired comparisonpreferences for 9 vegetables This is an example of Thurstonian scaling used by Guilford(1954) and Nunnally (1967) Other data sets include cubits peas and heights fromGalton

                                                                Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

                                                                bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

                                                                satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

                                                                epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

                                                                50

                                                                iq 14 multiple choice ability items were included as part of the Synthetic Aperture Person-ality Assessment (SAPA) web based personality assessment project The data from1000 subjects are included here as a demonstration set for scoring multiple choiceinventories and doing basic item statistics

                                                                galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to ntroduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

                                                                Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

                                                                miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

                                                                9 Development version and a users guide

                                                                The most recent development version is available as a source file at the repository main-tained at httppersonality-projectorgr That version will have removed the mostrecently discovered bugs (but perhaps introduced other yet to be discovered ones) Todownload that version go to the repository httppersonality-projectorgrsrc

                                                                contrib and wander around For a Mac this version can be installed directly using theldquoother repositoryrdquo option in the package installer For a PC the zip file for the most recentrelease has been created using the win-builder facility at CRAN The development releasefor the Mac is usually several weeks ahead of the PC development version

                                                                Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httppersonality-projectorgrpsych_manualpdf

                                                                News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

                                                                gt news(Version gt 170package=psych)

                                                                51

                                                                10 Psychometric Theory

                                                                The psych package has been developed to help psychologists do basic research Many ofthe functions were developed to supplement a book (httppersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep)More information about the use of some of the functions may be found in the book

                                                                For more extensive discussion of the use of psych in particular and R in general consulthttppersonality-projectorgrrguidehtml A short guide to R

                                                                11 SessionInfo

                                                                This document was prepared using the following settings

                                                                gt sessionInfo()

                                                                R Under development (unstable) (2017-03-05 r72309)

                                                                Platform x86_64-apple-darwin1340 (64-bit)

                                                                Running under macOS Sierra 10124

                                                                Matrix products default

                                                                BLAS LibraryFrameworksRframeworkVersions34ResourcesliblibRblas0dylib

                                                                LAPACK LibraryFrameworksRframeworkVersions34ResourcesliblibRlapackdylib

                                                                locale

                                                                [1] C

                                                                attached base packages

                                                                [1] stats graphics grDevices utils datasets methods base

                                                                other attached packages

                                                                [1] psych_17421

                                                                loaded via a namespace (and not attached)

                                                                [1] compiler_340 parallel_340 tools_340 foreign_08-67

                                                                [5] KernSmooth_223-15 nlme_31-131 mnormt_15-4 grid_340

                                                                [9] lattice_020-34

                                                                52

                                                                References

                                                                Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

                                                                Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

                                                                Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

                                                                Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

                                                                Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

                                                                Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

                                                                Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

                                                                Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

                                                                Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

                                                                Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

                                                                Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

                                                                Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

                                                                Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

                                                                Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

                                                                Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

                                                                53

                                                                Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

                                                                Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

                                                                Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

                                                                Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

                                                                Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

                                                                Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

                                                                Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

                                                                Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

                                                                Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

                                                                Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

                                                                MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

                                                                Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

                                                                McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

                                                                Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

                                                                Nunnally J C (1967) Psychometric theory McGraw-Hill New York

                                                                54

                                                                Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

                                                                3rd edition

                                                                Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

                                                                Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

                                                                Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

                                                                Revelle W (2015) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 158

                                                                Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

                                                                Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

                                                                Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

                                                                Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

                                                                Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook ofIndividual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

                                                                Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

                                                                Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

                                                                Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

                                                                Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bang

                                                                55

                                                                for the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

                                                                Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

                                                                Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

                                                                Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

                                                                Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

                                                                Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

                                                                Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

                                                                Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

                                                                Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

                                                                Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

                                                                Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

                                                                Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

                                                                56

                                                                Index

                                                                affect 14 24alpha 5 6

                                                                Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26bifactor 6biserial 13 34blockrandom 49burt 34

                                                                char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49circular statistics 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49ctv 7cubits 50

                                                                densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49diagram 8drawcor 34drawtetra 34dummycode 13

                                                                dynamite plot 19

                                                                edit 3epibfi 50error bars 19errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23

                                                                fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factor analysis 6factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49

                                                                galton 51generalized least squares 6geometricmean 49GPArotation 7guttman 6

                                                                harmonicmean 49head 49headtail 49heights 50hetdiagram 7Hmisc 28Holzinger 50

                                                                57

                                                                ICC 6 49iclust 6iclustdiagram 7Index 49introduction to psychometric theory with ap-

                                                                plications in R 7iqitems 50irtfa 6 47irt2latex 47 49

                                                                KnitR 47

                                                                lavaan 38library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27lowess 14

                                                                makekeys 14MAP 6mardia 49maximum likelihood 6mediate 4 41 42mediatediagram 41minimum residual 6mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevel 37multilevelreliability 6multiple regression 38

                                                                nfactors 6nlme 37

                                                                omega 6 7outlier 3 11 12

                                                                padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7principal axis 6psych 3 5ndash8 28 47 49ndash52

                                                                R functionaffect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50

                                                                58

                                                                densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13edit 3epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49head 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47

                                                                irt2latex 47 49library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7psych 51psych package

                                                                affect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26

                                                                59

                                                                biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49corsmooth 34cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7

                                                                fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47irt2latex 47 49lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12prep 49pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34

                                                                60

                                                                polyserial 34principal 5ndash7psych 51rtest 28rangeCorrection 49readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

                                                                rtest 28

                                                                rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50table 19tail 49tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

                                                                R package

                                                                61

                                                                ctv 7GPArotation 7Hmisc 28KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash8 28 47 49ndash52Rgraphviz 8sem 7 50stats 28Sweave 47xtable 47

                                                                rtest 28rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49Rgraphviz 8

                                                                SAPA 26 50 51satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43sem 7 50set correlation 44setcor 49setCor 4 38 41 44 46 47simmultilevel 37

                                                                spider 13stars 14stats 28StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50Sweave 47

                                                                table 19tail 49tetrachoric 6 34Thurstone 28 38 50Thurstone33 50topBottom 49

                                                                vegetables 50 51violinBy 14 18vss 5 6

                                                                weighted least squares 6withinBetween 37

                                                                xtable 47

                                                                62

                                                                • Jump starting the psych packagendasha guide for the impatient
                                                                • Psychometric functions are summarized in the second vignette
                                                                • Overview of this and related documents
                                                                • Getting started
                                                                • Basic data analysis
                                                                  • Getting the data by using readfile
                                                                  • Data input from the clipboard
                                                                  • Basic descriptive statistics
                                                                    • Outlier detection using outlier
                                                                    • Basic data cleaning using scrub
                                                                    • Recoding categorical variables into dummy coded variables
                                                                      • Simple descriptive graphics
                                                                        • Scatter Plot Matrices
                                                                        • Density or violin plots
                                                                        • Means and error bars
                                                                        • Error bars for tabular data
                                                                        • Two dimensional displays of means and errors
                                                                        • Back to back histograms
                                                                        • Correlational structure
                                                                        • Heatmap displays of correlational structure
                                                                          • Testing correlations
                                                                          • Polychoric tetrachoric polyserial and biserial correlations
                                                                            • Multilevel modeling
                                                                              • Decomposing data into within and between level correlations using statsBy
                                                                              • Generating and displaying multilevel data
                                                                              • Factor analysis by groups
                                                                                • Multiple Regression mediation moderation and set correlations
                                                                                  • Multiple regression from data or correlation matrices
                                                                                  • Mediation and Moderation analysis
                                                                                  • Set Correlation
                                                                                    • Converting output to APA style tables using LaTeX
                                                                                    • Miscellaneous functions
                                                                                    • Data sets
                                                                                    • Development version and a users guide
                                                                                    • Psychometric Theory
                                                                                    • SessionInfo

                                                                  depending upon the input

                                                                  1) For a sample size n find the t and p value for a single correlation as well as the confidenceinterval

                                                                  gt rtest(503)

                                                                  Correlation tests

                                                                  Callrtest(n = 50 r12 = 03)

                                                                  Test of significance of a correlation

                                                                  t value 218 with probability lt 0034

                                                                  and confidence interval 002 053

                                                                  2) For sample sizes of n and n2 (n2 = n if not specified) find the z of the difference betweenthe z transformed correlations divided by the standard error of the difference of two zscores

                                                                  gt rtest(3046)

                                                                  Correlation tests

                                                                  Callrtest(n = 30 r12 = 04 r34 = 06)

                                                                  Test of difference between two independent correlations

                                                                  z value 099 with probability 032

                                                                  3) For sample size n and correlations ra= r12 rb= r23 and r13 specified test for thedifference of two dependent correlations (Steiger case A)

                                                                  gt rtest(103451)

                                                                  Correlation tests

                                                                  Call[1] rtest(n = 103 r12 = 04 r23 = 01 r13 = 05 )

                                                                  Test of difference between two correlated correlations

                                                                  t value -089 with probability lt 037

                                                                  4) For sample size n test for the difference between two dependent correlations involvingdifferent variables (Steiger case B)

                                                                  gt rtest(103567558) steiger Case B

                                                                  Correlation tests

                                                                  Callrtest(n = 103 r12 = 05 r34 = 06 r23 = 07 r13 = 05 r14 = 05

                                                                  r24 = 08)

                                                                  Test of difference between two dependent correlations

                                                                  z value -12 with probability 023

                                                                  To test whether a matrix of correlations differs from what would be expected if the popu-lation correlations were all zero the function cortest follows Steiger (1980) who pointedout that the sum of the squared elements of a correlation matrix or the Fisher z scoreequivalents is distributed as chi square under the null hypothesis that the values are zero(ie elements of the identity matrix) This is particularly useful for examining whethercorrelations in a single matrix differ from zero or for comparing two matrices Althoughobvious cortest can be used to test whether the satact data matrix produces non-zerocorrelations (it does) This is a much more appropriate test when testing whether a residualmatrix differs from zero

                                                                  gt cortest(satact)

                                                                  33

                                                                  Tests of correlation matrices

                                                                  Callcortest(R1 = satact)

                                                                  Chi Square value 132542 with df = 15 with probability lt 18e-273

                                                                  36 Polychoric tetrachoric polyserial and biserial correlations

                                                                  The Pearson correlation of dichotomous data is also known as the φ coefficient If thedata eg ability items are thought to represent an underlying continuous although latentvariable the φ will underestimate the value of the Pearson applied to these latent variablesOne solution to this problem is to use the tetrachoric correlation which is based uponthe assumption of a bivariate normal distribution that has been cut at certain points Thedrawtetra function demonstrates the process (Figure 14) This is also shown in termsof dichotomizing the bivariate normal density function using the drawcor function (Fig-ure 15) A simple generalization of this to the case of the multiple cuts is the polychoric

                                                                  correlation

                                                                  Other estimated correlations based upon the assumption of bivariate normality with cutpoints include the biserial and polyserial correlation

                                                                  If the data are a mix of continuous polytomous and dichotomous variables the mixedcor

                                                                  function will calculate the appropriate mixture of Pearson polychoric tetrachoric biserialand polyserial correlations

                                                                  The correlation matrix resulting from a number of tetrachoric or polychoric correlationmatrix sometimes will not be positive semi-definite This will sometimes happen if thecorrelation matrix is formed by using pair-wise deletion of cases The corsmooth functionwill adjust the smallest eigen values of the correlation matrix to make them positive rescaleall of them to sum to the number of variables and produce aldquosmoothedrdquocorrelation matrixAn example of this problem is a data set of burt which probably had a typo in the originalcorrelation matrix Smoothing the matrix corrects this problem

                                                                  4 Multilevel modeling

                                                                  Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular use

                                                                  34

                                                                  gt drawtetra()

                                                                  minus3 minus2 minus1 0 1 2 3

                                                                  minus3

                                                                  minus2

                                                                  minus1

                                                                  01

                                                                  23

                                                                  Y rho = 05phi = 033

                                                                  X gt τY gt Τ

                                                                  X lt τY gt Τ

                                                                  X gt τY lt Τ

                                                                  X lt τY lt Τ

                                                                  x

                                                                  dnor

                                                                  m(x

                                                                  )

                                                                  X gt τ

                                                                  τ

                                                                  x1

                                                                  Y gt Τ

                                                                  Τ

                                                                  Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values

                                                                  35

                                                                  gt drawcor(expand=20cuts=c(00))

                                                                  xy

                                                                  z

                                                                  Bivariate density rho = 05

                                                                  Figure 15 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values It isfound (laboriously) by optimizing the fit of the bivariate normal for various values of thecorrelation to the observed cell frequencies

                                                                  36

                                                                  is the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups

                                                                  41 Decomposing data into within and between level correlations usingstatsBy

                                                                  There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler functionto give some of the basic descriptive statistics for two level models

                                                                  This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package

                                                                  rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)

                                                                  where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means

                                                                  42 Generating and displaying multilevel data

                                                                  withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a betweengroup correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data

                                                                  simmultilevel will generate simulated data with a multilevel structure

                                                                  The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary

                                                                  function specifying the variable of interest

                                                                  37

                                                                  Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )

                                                                  43 Factor analysis by groups

                                                                  Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output

                                                                  sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)

                                                                  faBy(sbnfactors=5) find the 5 factor solution for each education level

                                                                  5 Multiple Regression mediation moderation and set cor-relations

                                                                  The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices

                                                                  51 Multiple regression from data or correlation matrices

                                                                  The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstonecorrelation matrix and find the multiple correlation of the last five variables as a functionof the first 4

                                                                  gt setCor(y = 59x=14data=Thurstone)

                                                                  Call setCor(y = 59 x = 14 data = Thurstone)

                                                                  Multiple Regression from matrix input

                                                                  Beta weights

                                                                  FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

                                                                  Sentences 009 007 025 021 020

                                                                  Vocabulary 009 017 009 016 -002

                                                                  SentCompletion 002 005 004 021 008

                                                                  FirstLetters 058 045 021 008 031

                                                                  38

                                                                  Multiple R

                                                                  FourLetterWords Suffixes LetterSeries Pedigrees

                                                                  069 063 050 058

                                                                  LetterGroup

                                                                  048

                                                                  multiple R2

                                                                  FourLetterWords Suffixes LetterSeries Pedigrees

                                                                  048 040 025 034

                                                                  LetterGroup

                                                                  023

                                                                  Multiple Inflation Factor (VIF) = 1(1-SMC) =

                                                                  Sentences Vocabulary SentCompletion FirstLetters

                                                                  369 388 300 135

                                                                  Unweighted multiple R

                                                                  FourLetterWords Suffixes LetterSeries Pedigrees

                                                                  059 058 049 058

                                                                  LetterGroup

                                                                  045

                                                                  Unweighted multiple R2

                                                                  FourLetterWords Suffixes LetterSeries Pedigrees

                                                                  034 034 024 033

                                                                  LetterGroup

                                                                  020

                                                                  Various estimates of between set correlations

                                                                  Squared Canonical Correlations

                                                                  [1] 06280 01478 00076 00049

                                                                  Average squared canonical correlation = 02

                                                                  Cohens Set Correlation R2 = 069

                                                                  Unweighted correlation between the two sets = 073

                                                                  By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed

                                                                  gt sc lt- setCor(y = 59x=34data=Thurstonez=12)

                                                                  Call setCor(y = 59 x = 34 data = Thurstone z = 12)

                                                                  Multiple Regression from matrix input

                                                                  Beta weights

                                                                  FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

                                                                  SentCompletion 002 005 004 021 008

                                                                  FirstLetters 058 045 021 008 031

                                                                  Multiple R

                                                                  FourLetterWords Suffixes LetterSeries Pedigrees

                                                                  058 046 021 018

                                                                  LetterGroup

                                                                  030

                                                                  39

                                                                  multiple R2

                                                                  FourLetterWords Suffixes LetterSeries Pedigrees

                                                                  0331 0210 0043 0032

                                                                  LetterGroup

                                                                  0092

                                                                  Multiple Inflation Factor (VIF) = 1(1-SMC) =

                                                                  SentCompletion FirstLetters

                                                                  102 102

                                                                  Unweighted multiple R

                                                                  FourLetterWords Suffixes LetterSeries Pedigrees

                                                                  044 035 017 014

                                                                  LetterGroup

                                                                  026

                                                                  Unweighted multiple R2

                                                                  FourLetterWords Suffixes LetterSeries Pedigrees

                                                                  019 012 003 002

                                                                  LetterGroup

                                                                  007

                                                                  Various estimates of between set correlations

                                                                  Squared Canonical Correlations

                                                                  [1] 0405 0023

                                                                  Average squared canonical correlation = 021

                                                                  Cohens Set Correlation R2 = 042

                                                                  Unweighted correlation between the two sets = 048

                                                                  gt round(sc$residual2)

                                                                  FourLetterWords Suffixes LetterSeries Pedigrees

                                                                  FourLetterWords 052 011 009 006

                                                                  Suffixes 011 060 -001 001

                                                                  LetterSeries 009 -001 075 028

                                                                  Pedigrees 006 001 028 066

                                                                  LetterGroup 013 003 037 020

                                                                  LetterGroup

                                                                  FourLetterWords 013

                                                                  Suffixes 003

                                                                  LetterSeries 037

                                                                  Pedigrees 020

                                                                  LetterGroup 077

                                                                  52 Mediation and Moderation analysis

                                                                  Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping

                                                                  40

                                                                  Consider the example from Preacher and Hayes (2004) as analyzed using the mediate

                                                                  function and the subsequent graphic from mediatediagram The data are found in theexample for mediate

                                                                  Call mediate(y = SATIS x = THERAPY m = ATTRIB data = sobel)

                                                                  The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB

                                                                  Total Direct effect(c) of THERAPY on SATIS = 076 SE = 031 t direct = 25 with probability = 0019

                                                                  Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t direct = 135 with probability = 019

                                                                  Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033

                                                                  Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 004 Upper CI = 069

                                                                  R2 of model = 031

                                                                  To see the longer output specify short = FALSE in the print statement

                                                                  Full output

                                                                  Total effect estimates (c)

                                                                  SATIS se t Prob

                                                                  THERAPY 076 031 25 00186

                                                                  Direct effect estimates (c)SATIS se t Prob

                                                                  THERAPY 043 032 135 0190

                                                                  ATTRIB 040 018 223 0034

                                                                  a effect estimates

                                                                  THERAPY se t Prob

                                                                  ATTRIB 082 03 274 00106

                                                                  b effect estimates

                                                                  SATIS se t Prob

                                                                  ATTRIB 04 018 223 0034

                                                                  ab effect estimates

                                                                  SATIS boot sd lower upper

                                                                  THERAPY 033 032 017 004 069

                                                                  bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables

                                                                  setCor(y = c( SATV SATQ) x = c(education age ) data = satact std=TRUE)

                                                                  bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap

                                                                  mediate(y = c( SATV ) x = c(education age ) m= ACT data =satactstd=TRUEniter=50)

                                                                  bull mediate will take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It then tests the mediation effect using a bootstrap The particular example is for demonstration purposes only and shows neithermoderation nor mediation The number of iterations for the boot strap was set to 50

                                                                  41

                                                                  gt mediatediagram(preacher)

                                                                  Mediation model

                                                                  THERAPY SATIS

                                                                  ATTRIB

                                                                  082

                                                                  c = 076

                                                                  c = 043

                                                                  04

                                                                  Figure 16 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram

                                                                  42

                                                                  gt preacher lt- setCor(1c(23)sobelstd=FALSE)

                                                                  gt setCordiagram(preacher)

                                                                  Regression Models

                                                                  THERAPY

                                                                  ATTRIB

                                                                  SATIS

                                                                  043

                                                                  04

                                                                  021

                                                                  Figure 17 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure

                                                                  43

                                                                  for speed The default number of boot straps is 5000

                                                                  53 Set Correlation

                                                                  An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor

                                                                  function Set correlation is

                                                                  R2 = 1minusn

                                                                  prodi=1

                                                                  (1minusλi)

                                                                  where λi is the ith eigen value of the eigen value decomposition of the matrix

                                                                  R = Rminus1xx RxyRminus1

                                                                  xx Rminus1xy

                                                                  Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate

                                                                  setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data

                                                                  Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things tonotice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied

                                                                  For this example the analysis is done on the correlation matrix rather than the rawdata

                                                                  gt C lt- cov(satactuse=pairwise)

                                                                  gt model1 lt- lm(ACT~ gender + education + age data=satact)

                                                                  gt summary(model1)

                                                                  Call

                                                                  lm(formula = ACT ~ gender + education + age data = satact)

                                                                  Residuals

                                                                  44

                                                                  Call mediate(y = c(SATQ) x = c(ACT) m = education data = satact

                                                                  mod = gender niter = 50 std = TRUE)

                                                                  The DV (Y) was SATQ The IV (X) was ACT gender ACTXgndr The mediating variable(s) = education

                                                                  Total Direct effect(c) of ACT on SATQ = 058 SE = 003 t direct = 1925 with probability = 0

                                                                  Direct effect (c) of ACT on SATQ removing education = 059 SE = 003 t direct = 1926 with probability = 0

                                                                  Indirect effect (ab) of ACT on SATQ through education = -001

                                                                  Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -002 Upper CI = 0

                                                                  Total Direct effect(c) of gender on SATQ = -014 SE = 003 t direct = -478 with probability = 21e-06

                                                                  Direct effect (c) of gender on NA removing education = -014 SE = 003 t direct = -463 with probability = 44e-06

                                                                  Indirect effect (ab) of gender on SATQ through education = 0

                                                                  Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -001 Upper CI = 0

                                                                  Total Direct effect(c) of ACTXgndr on SATQ = 0 SE = 003 t direct = 002 with probability = 099

                                                                  Direct effect (c) of ACTXgndr on NA removing education = 0 SE = 003 t direct = 001 with probability = 099

                                                                  Indirect effect (ab) of ACTXgndr on SATQ through education = 0

                                                                  Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = 0 Upper CI = 0

                                                                  R2 of model = 037

                                                                  To see the longer output specify short = FALSE in the print statement

                                                                  Full output

                                                                  Total effect estimates (c)

                                                                  SATQ se t Prob

                                                                  ACT 058 003 1925 000e+00

                                                                  gender -014 003 -478 210e-06

                                                                  ACTXgndr 000 003 002 985e-01

                                                                  Direct effect estimates (c)SATQ se t Prob

                                                                  ACT 059 003 1926 000e+00

                                                                  gender -014 003 -463 437e-06

                                                                  ACTXgndr 000 003 001 992e-01

                                                                  a effect estimates

                                                                  education se t Prob

                                                                  ACT 016 004 422 277e-05

                                                                  gender 009 004 250 128e-02

                                                                  ACTXgndr -001 004 -015 883e-01

                                                                  b effect estimates

                                                                  SATQ se t Prob

                                                                  education -004 003 -145 0147

                                                                  ab effect estimates

                                                                  SATQ boot sd lower upper

                                                                  ACT -001 -001 001 0 0

                                                                  gender 000 000 000 0 0

                                                                  ACTXgndr 000 000 000 0 0

                                                                  Moderation model

                                                                  ACT

                                                                  gender

                                                                  ACTXgndr

                                                                  SATQ

                                                                  education016 c = 058

                                                                  c = 059

                                                                  009 c = minus014

                                                                  c = minus014

                                                                  minus001 c = 0

                                                                  c = 0

                                                                  minus004

                                                                  minus004

                                                                  minus007

                                                                  002

                                                                  Figure 18 Moderated multiple regression requires the raw data

                                                                  45

                                                                  Min 1Q Median 3Q Max

                                                                  -252458 -32133 07769 35921 92630

                                                                  Coefficients

                                                                  Estimate Std Error t value Pr(gt|t|)

                                                                  (Intercept) 2741706 082140 33378 lt 2e-16

                                                                  gender -048606 037984 -1280 020110

                                                                  education 047890 015235 3143 000174

                                                                  age 001623 002278 0712 047650

                                                                  ---

                                                                  Signif codes 0 0001 001 005 01 1

                                                                  Residual standard error 4768 on 696 degrees of freedom

                                                                  Multiple R-squared 00272 Adjusted R-squared 002301

                                                                  F-statistic 6487 on 3 and 696 DF p-value 00002476

                                                                  Compare this with the output from setCor

                                                                  gt compare with sector

                                                                  gt setCor(c(46)c(13)C nobs=700)

                                                                  Call setCor(y = c(46) x = c(13) data = C nobs = 700)

                                                                  Multiple Regression from matrix input

                                                                  Beta weights

                                                                  ACT SATV SATQ

                                                                  gender -005 -003 -018

                                                                  education 014 010 010

                                                                  age 003 -010 -009

                                                                  Multiple R

                                                                  ACT SATV SATQ

                                                                  016 010 019

                                                                  multiple R2

                                                                  ACT SATV SATQ

                                                                  00272 00096 00359

                                                                  Multiple Inflation Factor (VIF) = 1(1-SMC) =

                                                                  gender education age

                                                                  101 145 144

                                                                  Unweighted multiple R

                                                                  ACT SATV SATQ

                                                                  015 005 011

                                                                  Unweighted multiple R2

                                                                  ACT SATV SATQ

                                                                  002 000 001

                                                                  SE of Beta weights

                                                                  ACT SATV SATQ

                                                                  gender 018 429 434

                                                                  education 022 513 518

                                                                  age 022 511 516

                                                                  t of Beta Weights

                                                                  ACT SATV SATQ

                                                                  gender -027 -001 -004

                                                                  education 065 002 002

                                                                  46

                                                                  age 015 -002 -002

                                                                  Probability of t lt

                                                                  ACT SATV SATQ

                                                                  gender 079 099 097

                                                                  education 051 098 098

                                                                  age 088 098 099

                                                                  Shrunken R2

                                                                  ACT SATV SATQ

                                                                  00230 00054 00317

                                                                  Standard Error of R2

                                                                  ACT SATV SATQ

                                                                  00120 00073 00137

                                                                  F

                                                                  ACT SATV SATQ

                                                                  649 226 863

                                                                  Probability of F lt

                                                                  ACT SATV SATQ

                                                                  248e-04 808e-02 124e-05

                                                                  degrees of freedom of regression

                                                                  [1] 3 696

                                                                  Various estimates of between set correlations

                                                                  Squared Canonical Correlations

                                                                  [1] 0050 0033 0008

                                                                  Chisq of canonical correlations

                                                                  [1] 358 231 56

                                                                  Average squared canonical correlation = 003

                                                                  Cohens Set Correlation R2 = 009

                                                                  Shrunken Set Correlation R2 = 008

                                                                  F and df of Cohens Set Correlation 726 9 168186

                                                                  Unweighted correlation between the two sets = 001

                                                                  Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

                                                                  6 Converting output to APA style tables using LATEX

                                                                  Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenient

                                                                  47

                                                                  LATEXoutput and finally df2latex converts a generic data frame to LATEX

                                                                  An example of converting the output from fa to LATEXappears in Table 2

                                                                  Table 2 fa2latexA factor analysis table from the psych package in R

                                                                  Variable MR1 MR2 MR3 h2 u2 com

                                                                  Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

                                                                  SS loadings 264 186 15

                                                                  MR1 100 059 054MR2 059 100 052MR3 054 052 100

                                                                  48

                                                                  7 Miscellaneous functions

                                                                  A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

                                                                  blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

                                                                  df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

                                                                  scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

                                                                  cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

                                                                  cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

                                                                  dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

                                                                  fisherz Convert a correlation to the corresponding Fisher z score

                                                                  geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

                                                                  ICC and cohenkappa are typically used to find the reliability for raters

                                                                  headtail combines the head and tail functions to show the first and last lines of a dataset or output

                                                                  topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

                                                                  mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

                                                                  prep finds the probability of replication for an F t or r and estimate effect size

                                                                  partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

                                                                  rangeCorrection will correct correlations for restriction of range

                                                                  reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

                                                                  49

                                                                  superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

                                                                  8 Data sets

                                                                  A number of data sets for demonstrating psychometric techniques are included in thepsych package These include six data sets showing a hierarchical factor structure (fivecognitive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2and one from health psychology Reise) One of these (Thurstone) is used as an examplein the sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) or 13 personality inventory scores (epibfi) and14 multiple choice iq items (iqitems) The vegetables example has paired comparisonpreferences for 9 vegetables This is an example of Thurstonian scaling used by Guilford(1954) and Nunnally (1967) Other data sets include cubits peas and heights fromGalton

                                                                  Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

                                                                  bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

                                                                  satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

                                                                  epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

                                                                  50

                                                                  iq 14 multiple choice ability items were included as part of the Synthetic Aperture Person-ality Assessment (SAPA) web based personality assessment project The data from1000 subjects are included here as a demonstration set for scoring multiple choiceinventories and doing basic item statistics

                                                                  galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to ntroduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

                                                                  Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

                                                                  miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

                                                                  9 Development version and a users guide

                                                                  The most recent development version is available as a source file at the repository main-tained at httppersonality-projectorgr That version will have removed the mostrecently discovered bugs (but perhaps introduced other yet to be discovered ones) Todownload that version go to the repository httppersonality-projectorgrsrc

                                                                  contrib and wander around For a Mac this version can be installed directly using theldquoother repositoryrdquo option in the package installer For a PC the zip file for the most recentrelease has been created using the win-builder facility at CRAN The development releasefor the Mac is usually several weeks ahead of the PC development version

                                                                  Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httppersonality-projectorgrpsych_manualpdf

                                                                  News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

                                                                  gt news(Version gt 170package=psych)

                                                                  51

                                                                  10 Psychometric Theory

                                                                  The psych package has been developed to help psychologists do basic research Many ofthe functions were developed to supplement a book (httppersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep)More information about the use of some of the functions may be found in the book

                                                                  For more extensive discussion of the use of psych in particular and R in general consulthttppersonality-projectorgrrguidehtml A short guide to R

                                                                  11 SessionInfo

                                                                  This document was prepared using the following settings

                                                                  gt sessionInfo()

                                                                  R Under development (unstable) (2017-03-05 r72309)

                                                                  Platform x86_64-apple-darwin1340 (64-bit)

                                                                  Running under macOS Sierra 10124

                                                                  Matrix products default

                                                                  BLAS LibraryFrameworksRframeworkVersions34ResourcesliblibRblas0dylib

                                                                  LAPACK LibraryFrameworksRframeworkVersions34ResourcesliblibRlapackdylib

                                                                  locale

                                                                  [1] C

                                                                  attached base packages

                                                                  [1] stats graphics grDevices utils datasets methods base

                                                                  other attached packages

                                                                  [1] psych_17421

                                                                  loaded via a namespace (and not attached)

                                                                  [1] compiler_340 parallel_340 tools_340 foreign_08-67

                                                                  [5] KernSmooth_223-15 nlme_31-131 mnormt_15-4 grid_340

                                                                  [9] lattice_020-34

                                                                  52

                                                                  References

                                                                  Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

                                                                  Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

                                                                  Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

                                                                  Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

                                                                  Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

                                                                  Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

                                                                  Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

                                                                  Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

                                                                  Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

                                                                  Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

                                                                  Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

                                                                  Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

                                                                  Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

                                                                  Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

                                                                  Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

                                                                  53

                                                                  Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

                                                                  Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

                                                                  Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

                                                                  Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

                                                                  Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

                                                                  Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

                                                                  Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

                                                                  Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

                                                                  Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

                                                                  Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

                                                                  MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

                                                                  Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

                                                                  McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

                                                                  Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

                                                                  Nunnally J C (1967) Psychometric theory McGraw-Hill New York

                                                                  54

                                                                  Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

                                                                  3rd edition

                                                                  Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

                                                                  Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

                                                                  Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

                                                                  Revelle W (2015) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 158

                                                                  Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

                                                                  Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

                                                                  Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

                                                                  Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

                                                                  Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook ofIndividual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

                                                                  Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

                                                                  Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

                                                                  Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

                                                                  Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bang

                                                                  55

                                                                  for the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

                                                                  Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

                                                                  Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

                                                                  Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

                                                                  Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

                                                                  Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

                                                                  Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

                                                                  Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

                                                                  Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

                                                                  Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

                                                                  Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

                                                                  Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

                                                                  56

                                                                  Index

                                                                  affect 14 24alpha 5 6

                                                                  Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26bifactor 6biserial 13 34blockrandom 49burt 34

                                                                  char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49circular statistics 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49ctv 7cubits 50

                                                                  densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49diagram 8drawcor 34drawtetra 34dummycode 13

                                                                  dynamite plot 19

                                                                  edit 3epibfi 50error bars 19errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23

                                                                  fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factor analysis 6factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49

                                                                  galton 51generalized least squares 6geometricmean 49GPArotation 7guttman 6

                                                                  harmonicmean 49head 49headtail 49heights 50hetdiagram 7Hmisc 28Holzinger 50

                                                                  57

                                                                  ICC 6 49iclust 6iclustdiagram 7Index 49introduction to psychometric theory with ap-

                                                                  plications in R 7iqitems 50irtfa 6 47irt2latex 47 49

                                                                  KnitR 47

                                                                  lavaan 38library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27lowess 14

                                                                  makekeys 14MAP 6mardia 49maximum likelihood 6mediate 4 41 42mediatediagram 41minimum residual 6mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevel 37multilevelreliability 6multiple regression 38

                                                                  nfactors 6nlme 37

                                                                  omega 6 7outlier 3 11 12

                                                                  padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7principal axis 6psych 3 5ndash8 28 47 49ndash52

                                                                  R functionaffect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50

                                                                  58

                                                                  densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13edit 3epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49head 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47

                                                                  irt2latex 47 49library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7psych 51psych package

                                                                  affect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26

                                                                  59

                                                                  biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49corsmooth 34cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7

                                                                  fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47irt2latex 47 49lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12prep 49pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34

                                                                  60

                                                                  polyserial 34principal 5ndash7psych 51rtest 28rangeCorrection 49readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

                                                                  rtest 28

                                                                  rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50table 19tail 49tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

                                                                  R package

                                                                  61

                                                                  ctv 7GPArotation 7Hmisc 28KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash8 28 47 49ndash52Rgraphviz 8sem 7 50stats 28Sweave 47xtable 47

                                                                  rtest 28rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49Rgraphviz 8

                                                                  SAPA 26 50 51satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43sem 7 50set correlation 44setcor 49setCor 4 38 41 44 46 47simmultilevel 37

                                                                  spider 13stars 14stats 28StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50Sweave 47

                                                                  table 19tail 49tetrachoric 6 34Thurstone 28 38 50Thurstone33 50topBottom 49

                                                                  vegetables 50 51violinBy 14 18vss 5 6

                                                                  weighted least squares 6withinBetween 37

                                                                  xtable 47

                                                                  62

                                                                  • Jump starting the psych packagendasha guide for the impatient
                                                                  • Psychometric functions are summarized in the second vignette
                                                                  • Overview of this and related documents
                                                                  • Getting started
                                                                  • Basic data analysis
                                                                    • Getting the data by using readfile
                                                                    • Data input from the clipboard
                                                                    • Basic descriptive statistics
                                                                      • Outlier detection using outlier
                                                                      • Basic data cleaning using scrub
                                                                      • Recoding categorical variables into dummy coded variables
                                                                        • Simple descriptive graphics
                                                                          • Scatter Plot Matrices
                                                                          • Density or violin plots
                                                                          • Means and error bars
                                                                          • Error bars for tabular data
                                                                          • Two dimensional displays of means and errors
                                                                          • Back to back histograms
                                                                          • Correlational structure
                                                                          • Heatmap displays of correlational structure
                                                                            • Testing correlations
                                                                            • Polychoric tetrachoric polyserial and biserial correlations
                                                                              • Multilevel modeling
                                                                                • Decomposing data into within and between level correlations using statsBy
                                                                                • Generating and displaying multilevel data
                                                                                • Factor analysis by groups
                                                                                  • Multiple Regression mediation moderation and set correlations
                                                                                    • Multiple regression from data or correlation matrices
                                                                                    • Mediation and Moderation analysis
                                                                                    • Set Correlation
                                                                                      • Converting output to APA style tables using LaTeX
                                                                                      • Miscellaneous functions
                                                                                      • Data sets
                                                                                      • Development version and a users guide
                                                                                      • Psychometric Theory
                                                                                      • SessionInfo

                                                                    Tests of correlation matrices

                                                                    Callcortest(R1 = satact)

                                                                    Chi Square value 132542 with df = 15 with probability lt 18e-273

                                                                    36 Polychoric tetrachoric polyserial and biserial correlations

                                                                    The Pearson correlation of dichotomous data is also known as the φ coefficient If thedata eg ability items are thought to represent an underlying continuous although latentvariable the φ will underestimate the value of the Pearson applied to these latent variablesOne solution to this problem is to use the tetrachoric correlation which is based uponthe assumption of a bivariate normal distribution that has been cut at certain points Thedrawtetra function demonstrates the process (Figure 14) This is also shown in termsof dichotomizing the bivariate normal density function using the drawcor function (Fig-ure 15) A simple generalization of this to the case of the multiple cuts is the polychoric

                                                                    correlation

                                                                    Other estimated correlations based upon the assumption of bivariate normality with cutpoints include the biserial and polyserial correlation

                                                                    If the data are a mix of continuous polytomous and dichotomous variables the mixedcor

                                                                    function will calculate the appropriate mixture of Pearson polychoric tetrachoric biserialand polyserial correlations

                                                                    The correlation matrix resulting from a number of tetrachoric or polychoric correlationmatrix sometimes will not be positive semi-definite This will sometimes happen if thecorrelation matrix is formed by using pair-wise deletion of cases The corsmooth functionwill adjust the smallest eigen values of the correlation matrix to make them positive rescaleall of them to sum to the number of variables and produce aldquosmoothedrdquocorrelation matrixAn example of this problem is a data set of burt which probably had a typo in the originalcorrelation matrix Smoothing the matrix corrects this problem

                                                                    4 Multilevel modeling

                                                                    Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular use

                                                                    34

                                                                    gt drawtetra()

                                                                    minus3 minus2 minus1 0 1 2 3

                                                                    minus3

                                                                    minus2

                                                                    minus1

                                                                    01

                                                                    23

                                                                    Y rho = 05phi = 033

                                                                    X gt τY gt Τ

                                                                    X lt τY gt Τ

                                                                    X gt τY lt Τ

                                                                    X lt τY lt Τ

                                                                    x

                                                                    dnor

                                                                    m(x

                                                                    )

                                                                    X gt τ

                                                                    τ

                                                                    x1

                                                                    Y gt Τ

                                                                    Τ

                                                                    Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values

                                                                    35

                                                                    gt drawcor(expand=20cuts=c(00))

                                                                    xy

                                                                    z

                                                                    Bivariate density rho = 05

                                                                    Figure 15 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values It isfound (laboriously) by optimizing the fit of the bivariate normal for various values of thecorrelation to the observed cell frequencies

                                                                    36

                                                                    is the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups

                                                                    41 Decomposing data into within and between level correlations usingstatsBy

                                                                    There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler functionto give some of the basic descriptive statistics for two level models

                                                                    This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package

                                                                    rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)

                                                                    where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means

                                                                    42 Generating and displaying multilevel data

                                                                    withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a betweengroup correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data

                                                                    simmultilevel will generate simulated data with a multilevel structure

                                                                    The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary

                                                                    function specifying the variable of interest

                                                                    37

                                                                    Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )

                                                                    43 Factor analysis by groups

                                                                    Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output

                                                                    sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)

                                                                    faBy(sbnfactors=5) find the 5 factor solution for each education level

                                                                    5 Multiple Regression mediation moderation and set cor-relations

                                                                    The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices

                                                                    51 Multiple regression from data or correlation matrices

                                                                    The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstonecorrelation matrix and find the multiple correlation of the last five variables as a functionof the first 4

                                                                    gt setCor(y = 59x=14data=Thurstone)

                                                                    Call setCor(y = 59 x = 14 data = Thurstone)

                                                                    Multiple Regression from matrix input

                                                                    Beta weights

                                                                    FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

                                                                    Sentences 009 007 025 021 020

                                                                    Vocabulary 009 017 009 016 -002

                                                                    SentCompletion 002 005 004 021 008

                                                                    FirstLetters 058 045 021 008 031

                                                                    38

                                                                    Multiple R

                                                                    FourLetterWords Suffixes LetterSeries Pedigrees

                                                                    069 063 050 058

                                                                    LetterGroup

                                                                    048

                                                                    multiple R2

                                                                    FourLetterWords Suffixes LetterSeries Pedigrees

                                                                    048 040 025 034

                                                                    LetterGroup

                                                                    023

                                                                    Multiple Inflation Factor (VIF) = 1(1-SMC) =

                                                                    Sentences Vocabulary SentCompletion FirstLetters

                                                                    369 388 300 135

                                                                    Unweighted multiple R

                                                                    FourLetterWords Suffixes LetterSeries Pedigrees

                                                                    059 058 049 058

                                                                    LetterGroup

                                                                    045

                                                                    Unweighted multiple R2

                                                                    FourLetterWords Suffixes LetterSeries Pedigrees

                                                                    034 034 024 033

                                                                    LetterGroup

                                                                    020

                                                                    Various estimates of between set correlations

                                                                    Squared Canonical Correlations

                                                                    [1] 06280 01478 00076 00049

                                                                    Average squared canonical correlation = 02

                                                                    Cohens Set Correlation R2 = 069

                                                                    Unweighted correlation between the two sets = 073

                                                                    By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed

                                                                    gt sc lt- setCor(y = 59x=34data=Thurstonez=12)

                                                                    Call setCor(y = 59 x = 34 data = Thurstone z = 12)

                                                                    Multiple Regression from matrix input

                                                                    Beta weights

                                                                    FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

                                                                    SentCompletion 002 005 004 021 008

                                                                    FirstLetters 058 045 021 008 031

                                                                    Multiple R

                                                                    FourLetterWords Suffixes LetterSeries Pedigrees

                                                                    058 046 021 018

                                                                    LetterGroup

                                                                    030

                                                                    39

                                                                    multiple R2

                                                                    FourLetterWords Suffixes LetterSeries Pedigrees

                                                                    0331 0210 0043 0032

                                                                    LetterGroup

                                                                    0092

                                                                    Multiple Inflation Factor (VIF) = 1(1-SMC) =

                                                                    SentCompletion FirstLetters

                                                                    102 102

                                                                    Unweighted multiple R

                                                                    FourLetterWords Suffixes LetterSeries Pedigrees

                                                                    044 035 017 014

                                                                    LetterGroup

                                                                    026

                                                                    Unweighted multiple R2

                                                                    FourLetterWords Suffixes LetterSeries Pedigrees

                                                                    019 012 003 002

                                                                    LetterGroup

                                                                    007

                                                                    Various estimates of between set correlations

                                                                    Squared Canonical Correlations

                                                                    [1] 0405 0023

                                                                    Average squared canonical correlation = 021

                                                                    Cohens Set Correlation R2 = 042

                                                                    Unweighted correlation between the two sets = 048

                                                                    gt round(sc$residual2)

                                                                    FourLetterWords Suffixes LetterSeries Pedigrees

                                                                    FourLetterWords 052 011 009 006

                                                                    Suffixes 011 060 -001 001

                                                                    LetterSeries 009 -001 075 028

                                                                    Pedigrees 006 001 028 066

                                                                    LetterGroup 013 003 037 020

                                                                    LetterGroup

                                                                    FourLetterWords 013

                                                                    Suffixes 003

                                                                    LetterSeries 037

                                                                    Pedigrees 020

                                                                    LetterGroup 077

                                                                    52 Mediation and Moderation analysis

                                                                    Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping

                                                                    40

                                                                    Consider the example from Preacher and Hayes (2004) as analyzed using the mediate

                                                                    function and the subsequent graphic from mediatediagram The data are found in theexample for mediate

                                                                    Call mediate(y = SATIS x = THERAPY m = ATTRIB data = sobel)

                                                                    The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB

                                                                    Total Direct effect(c) of THERAPY on SATIS = 076 SE = 031 t direct = 25 with probability = 0019

                                                                    Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t direct = 135 with probability = 019

                                                                    Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033

                                                                    Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 004 Upper CI = 069

                                                                    R2 of model = 031

                                                                    To see the longer output specify short = FALSE in the print statement

                                                                    Full output

                                                                    Total effect estimates (c)

                                                                    SATIS se t Prob

                                                                    THERAPY 076 031 25 00186

                                                                    Direct effect estimates (c)SATIS se t Prob

                                                                    THERAPY 043 032 135 0190

                                                                    ATTRIB 040 018 223 0034

                                                                    a effect estimates

                                                                    THERAPY se t Prob

                                                                    ATTRIB 082 03 274 00106

                                                                    b effect estimates

                                                                    SATIS se t Prob

                                                                    ATTRIB 04 018 223 0034

                                                                    ab effect estimates

                                                                    SATIS boot sd lower upper

                                                                    THERAPY 033 032 017 004 069

                                                                    bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables

                                                                    setCor(y = c( SATV SATQ) x = c(education age ) data = satact std=TRUE)

                                                                    bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap

                                                                    mediate(y = c( SATV ) x = c(education age ) m= ACT data =satactstd=TRUEniter=50)

                                                                    bull mediate will take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It then tests the mediation effect using a bootstrap The particular example is for demonstration purposes only and shows neithermoderation nor mediation The number of iterations for the boot strap was set to 50

                                                                    41

                                                                    gt mediatediagram(preacher)

                                                                    Mediation model

                                                                    THERAPY SATIS

                                                                    ATTRIB

                                                                    082

                                                                    c = 076

                                                                    c = 043

                                                                    04

                                                                    Figure 16 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram

                                                                    42

                                                                    gt preacher lt- setCor(1c(23)sobelstd=FALSE)

                                                                    gt setCordiagram(preacher)

                                                                    Regression Models

                                                                    THERAPY

                                                                    ATTRIB

                                                                    SATIS

                                                                    043

                                                                    04

                                                                    021

                                                                    Figure 17 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure

                                                                    43

                                                                    for speed The default number of boot straps is 5000

                                                                    53 Set Correlation

                                                                    An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor

                                                                    function Set correlation is

                                                                    R2 = 1minusn

                                                                    prodi=1

                                                                    (1minusλi)

                                                                    where λi is the ith eigen value of the eigen value decomposition of the matrix

                                                                    R = Rminus1xx RxyRminus1

                                                                    xx Rminus1xy

                                                                    Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate

                                                                    setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data

                                                                    Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things tonotice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied

                                                                    For this example the analysis is done on the correlation matrix rather than the rawdata

                                                                    gt C lt- cov(satactuse=pairwise)

                                                                    gt model1 lt- lm(ACT~ gender + education + age data=satact)

                                                                    gt summary(model1)

                                                                    Call

                                                                    lm(formula = ACT ~ gender + education + age data = satact)

                                                                    Residuals

                                                                    44

                                                                    Call mediate(y = c(SATQ) x = c(ACT) m = education data = satact

                                                                    mod = gender niter = 50 std = TRUE)

                                                                    The DV (Y) was SATQ The IV (X) was ACT gender ACTXgndr The mediating variable(s) = education

                                                                    Total Direct effect(c) of ACT on SATQ = 058 SE = 003 t direct = 1925 with probability = 0

                                                                    Direct effect (c) of ACT on SATQ removing education = 059 SE = 003 t direct = 1926 with probability = 0

                                                                    Indirect effect (ab) of ACT on SATQ through education = -001

                                                                    Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -002 Upper CI = 0

                                                                    Total Direct effect(c) of gender on SATQ = -014 SE = 003 t direct = -478 with probability = 21e-06

                                                                    Direct effect (c) of gender on NA removing education = -014 SE = 003 t direct = -463 with probability = 44e-06

                                                                    Indirect effect (ab) of gender on SATQ through education = 0

                                                                    Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -001 Upper CI = 0

                                                                    Total Direct effect(c) of ACTXgndr on SATQ = 0 SE = 003 t direct = 002 with probability = 099

                                                                    Direct effect (c) of ACTXgndr on NA removing education = 0 SE = 003 t direct = 001 with probability = 099

                                                                    Indirect effect (ab) of ACTXgndr on SATQ through education = 0

                                                                    Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = 0 Upper CI = 0

                                                                    R2 of model = 037

                                                                    To see the longer output specify short = FALSE in the print statement

                                                                    Full output

                                                                    Total effect estimates (c)

                                                                    SATQ se t Prob

                                                                    ACT 058 003 1925 000e+00

                                                                    gender -014 003 -478 210e-06

                                                                    ACTXgndr 000 003 002 985e-01

                                                                    Direct effect estimates (c)SATQ se t Prob

                                                                    ACT 059 003 1926 000e+00

                                                                    gender -014 003 -463 437e-06

                                                                    ACTXgndr 000 003 001 992e-01

                                                                    a effect estimates

                                                                    education se t Prob

                                                                    ACT 016 004 422 277e-05

                                                                    gender 009 004 250 128e-02

                                                                    ACTXgndr -001 004 -015 883e-01

                                                                    b effect estimates

                                                                    SATQ se t Prob

                                                                    education -004 003 -145 0147

                                                                    ab effect estimates

                                                                    SATQ boot sd lower upper

                                                                    ACT -001 -001 001 0 0

                                                                    gender 000 000 000 0 0

                                                                    ACTXgndr 000 000 000 0 0

                                                                    Moderation model

                                                                    ACT

                                                                    gender

                                                                    ACTXgndr

                                                                    SATQ

                                                                    education016 c = 058

                                                                    c = 059

                                                                    009 c = minus014

                                                                    c = minus014

                                                                    minus001 c = 0

                                                                    c = 0

                                                                    minus004

                                                                    minus004

                                                                    minus007

                                                                    002

                                                                    Figure 18 Moderated multiple regression requires the raw data

                                                                    45

                                                                    Min 1Q Median 3Q Max

                                                                    -252458 -32133 07769 35921 92630

                                                                    Coefficients

                                                                    Estimate Std Error t value Pr(gt|t|)

                                                                    (Intercept) 2741706 082140 33378 lt 2e-16

                                                                    gender -048606 037984 -1280 020110

                                                                    education 047890 015235 3143 000174

                                                                    age 001623 002278 0712 047650

                                                                    ---

                                                                    Signif codes 0 0001 001 005 01 1

                                                                    Residual standard error 4768 on 696 degrees of freedom

                                                                    Multiple R-squared 00272 Adjusted R-squared 002301

                                                                    F-statistic 6487 on 3 and 696 DF p-value 00002476

                                                                    Compare this with the output from setCor

                                                                    gt compare with sector

                                                                    gt setCor(c(46)c(13)C nobs=700)

                                                                    Call setCor(y = c(46) x = c(13) data = C nobs = 700)

                                                                    Multiple Regression from matrix input

                                                                    Beta weights

                                                                    ACT SATV SATQ

                                                                    gender -005 -003 -018

                                                                    education 014 010 010

                                                                    age 003 -010 -009

                                                                    Multiple R

                                                                    ACT SATV SATQ

                                                                    016 010 019

                                                                    multiple R2

                                                                    ACT SATV SATQ

                                                                    00272 00096 00359

                                                                    Multiple Inflation Factor (VIF) = 1(1-SMC) =

                                                                    gender education age

                                                                    101 145 144

                                                                    Unweighted multiple R

                                                                    ACT SATV SATQ

                                                                    015 005 011

                                                                    Unweighted multiple R2

                                                                    ACT SATV SATQ

                                                                    002 000 001

                                                                    SE of Beta weights

                                                                    ACT SATV SATQ

                                                                    gender 018 429 434

                                                                    education 022 513 518

                                                                    age 022 511 516

                                                                    t of Beta Weights

                                                                    ACT SATV SATQ

                                                                    gender -027 -001 -004

                                                                    education 065 002 002

                                                                    46

                                                                    age 015 -002 -002

                                                                    Probability of t lt

                                                                    ACT SATV SATQ

                                                                    gender 079 099 097

                                                                    education 051 098 098

                                                                    age 088 098 099

                                                                    Shrunken R2

                                                                    ACT SATV SATQ

                                                                    00230 00054 00317

                                                                    Standard Error of R2

                                                                    ACT SATV SATQ

                                                                    00120 00073 00137

                                                                    F

                                                                    ACT SATV SATQ

                                                                    649 226 863

                                                                    Probability of F lt

                                                                    ACT SATV SATQ

                                                                    248e-04 808e-02 124e-05

                                                                    degrees of freedom of regression

                                                                    [1] 3 696

                                                                    Various estimates of between set correlations

                                                                    Squared Canonical Correlations

                                                                    [1] 0050 0033 0008

                                                                    Chisq of canonical correlations

                                                                    [1] 358 231 56

                                                                    Average squared canonical correlation = 003

                                                                    Cohens Set Correlation R2 = 009

                                                                    Shrunken Set Correlation R2 = 008

                                                                    F and df of Cohens Set Correlation 726 9 168186

                                                                    Unweighted correlation between the two sets = 001

                                                                    Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

                                                                    6 Converting output to APA style tables using LATEX

                                                                    Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenient

                                                                    47

                                                                    LATEXoutput and finally df2latex converts a generic data frame to LATEX

                                                                    An example of converting the output from fa to LATEXappears in Table 2

                                                                    Table 2 fa2latexA factor analysis table from the psych package in R

                                                                    Variable MR1 MR2 MR3 h2 u2 com

                                                                    Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

                                                                    SS loadings 264 186 15

                                                                    MR1 100 059 054MR2 059 100 052MR3 054 052 100

                                                                    48

                                                                    7 Miscellaneous functions

                                                                    A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

                                                                    blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

                                                                    df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

                                                                    scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

                                                                    cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

                                                                    cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

                                                                    dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

                                                                    fisherz Convert a correlation to the corresponding Fisher z score

                                                                    geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

                                                                    ICC and cohenkappa are typically used to find the reliability for raters

                                                                    headtail combines the head and tail functions to show the first and last lines of a dataset or output

                                                                    topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

                                                                    mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

                                                                    prep finds the probability of replication for an F t or r and estimate effect size

                                                                    partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

                                                                    rangeCorrection will correct correlations for restriction of range

                                                                    reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

                                                                    49

                                                                    superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

                                                                    8 Data sets

                                                                    A number of data sets for demonstrating psychometric techniques are included in thepsych package These include six data sets showing a hierarchical factor structure (fivecognitive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2and one from health psychology Reise) One of these (Thurstone) is used as an examplein the sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) or 13 personality inventory scores (epibfi) and14 multiple choice iq items (iqitems) The vegetables example has paired comparisonpreferences for 9 vegetables This is an example of Thurstonian scaling used by Guilford(1954) and Nunnally (1967) Other data sets include cubits peas and heights fromGalton

                                                                    Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

                                                                    bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

                                                                    satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

                                                                    epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

                                                                    50

                                                                    iq 14 multiple choice ability items were included as part of the Synthetic Aperture Person-ality Assessment (SAPA) web based personality assessment project The data from1000 subjects are included here as a demonstration set for scoring multiple choiceinventories and doing basic item statistics

                                                                    galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to ntroduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

                                                                    Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

                                                                    miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

                                                                    9 Development version and a users guide

                                                                    The most recent development version is available as a source file at the repository main-tained at httppersonality-projectorgr That version will have removed the mostrecently discovered bugs (but perhaps introduced other yet to be discovered ones) Todownload that version go to the repository httppersonality-projectorgrsrc

                                                                    contrib and wander around For a Mac this version can be installed directly using theldquoother repositoryrdquo option in the package installer For a PC the zip file for the most recentrelease has been created using the win-builder facility at CRAN The development releasefor the Mac is usually several weeks ahead of the PC development version

                                                                    Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httppersonality-projectorgrpsych_manualpdf

                                                                    News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

                                                                    gt news(Version gt 170package=psych)

                                                                    51

                                                                    10 Psychometric Theory

                                                                    The psych package has been developed to help psychologists do basic research Many ofthe functions were developed to supplement a book (httppersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep)More information about the use of some of the functions may be found in the book

                                                                    For more extensive discussion of the use of psych in particular and R in general consulthttppersonality-projectorgrrguidehtml A short guide to R

                                                                    11 SessionInfo

                                                                    This document was prepared using the following settings

                                                                    gt sessionInfo()

                                                                    R Under development (unstable) (2017-03-05 r72309)

                                                                    Platform x86_64-apple-darwin1340 (64-bit)

                                                                    Running under macOS Sierra 10124

                                                                    Matrix products default

                                                                    BLAS LibraryFrameworksRframeworkVersions34ResourcesliblibRblas0dylib

                                                                    LAPACK LibraryFrameworksRframeworkVersions34ResourcesliblibRlapackdylib

                                                                    locale

                                                                    [1] C

                                                                    attached base packages

                                                                    [1] stats graphics grDevices utils datasets methods base

                                                                    other attached packages

                                                                    [1] psych_17421

                                                                    loaded via a namespace (and not attached)

                                                                    [1] compiler_340 parallel_340 tools_340 foreign_08-67

                                                                    [5] KernSmooth_223-15 nlme_31-131 mnormt_15-4 grid_340

                                                                    [9] lattice_020-34

                                                                    52

                                                                    References

                                                                    Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

                                                                    Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

                                                                    Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

                                                                    Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

                                                                    Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

                                                                    Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

                                                                    Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

                                                                    Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

                                                                    Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

                                                                    Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

                                                                    Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

                                                                    Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

                                                                    Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

                                                                    Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

                                                                    Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

                                                                    53

                                                                    Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

                                                                    Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

                                                                    Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

                                                                    Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

                                                                    Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

                                                                    Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

                                                                    Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

                                                                    Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

                                                                    Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

                                                                    Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

                                                                    MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

                                                                    Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

                                                                    McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

                                                                    Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

                                                                    Nunnally J C (1967) Psychometric theory McGraw-Hill New York

                                                                    54

                                                                    Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

                                                                    3rd edition

                                                                    Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

                                                                    Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

                                                                    Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

                                                                    Revelle W (2015) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 158

                                                                    Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

                                                                    Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

                                                                    Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

                                                                    Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

                                                                    Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook ofIndividual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

                                                                    Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

                                                                    Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

                                                                    Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

                                                                    Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bang

                                                                    55

                                                                    for the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

                                                                    Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

                                                                    Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

                                                                    Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

                                                                    Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

                                                                    Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

                                                                    Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

                                                                    Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

                                                                    Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

                                                                    Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

                                                                    Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

                                                                    Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

                                                                    56

                                                                    Index

                                                                    affect 14 24alpha 5 6

                                                                    Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26bifactor 6biserial 13 34blockrandom 49burt 34

                                                                    char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49circular statistics 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49ctv 7cubits 50

                                                                    densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49diagram 8drawcor 34drawtetra 34dummycode 13

                                                                    dynamite plot 19

                                                                    edit 3epibfi 50error bars 19errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23

                                                                    fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factor analysis 6factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49

                                                                    galton 51generalized least squares 6geometricmean 49GPArotation 7guttman 6

                                                                    harmonicmean 49head 49headtail 49heights 50hetdiagram 7Hmisc 28Holzinger 50

                                                                    57

                                                                    ICC 6 49iclust 6iclustdiagram 7Index 49introduction to psychometric theory with ap-

                                                                    plications in R 7iqitems 50irtfa 6 47irt2latex 47 49

                                                                    KnitR 47

                                                                    lavaan 38library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27lowess 14

                                                                    makekeys 14MAP 6mardia 49maximum likelihood 6mediate 4 41 42mediatediagram 41minimum residual 6mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevel 37multilevelreliability 6multiple regression 38

                                                                    nfactors 6nlme 37

                                                                    omega 6 7outlier 3 11 12

                                                                    padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7principal axis 6psych 3 5ndash8 28 47 49ndash52

                                                                    R functionaffect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50

                                                                    58

                                                                    densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13edit 3epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49head 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47

                                                                    irt2latex 47 49library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7psych 51psych package

                                                                    affect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26

                                                                    59

                                                                    biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49corsmooth 34cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7

                                                                    fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47irt2latex 47 49lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12prep 49pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34

                                                                    60

                                                                    polyserial 34principal 5ndash7psych 51rtest 28rangeCorrection 49readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

                                                                    rtest 28

                                                                    rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50table 19tail 49tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

                                                                    R package

                                                                    61

                                                                    ctv 7GPArotation 7Hmisc 28KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash8 28 47 49ndash52Rgraphviz 8sem 7 50stats 28Sweave 47xtable 47

                                                                    rtest 28rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49Rgraphviz 8

                                                                    SAPA 26 50 51satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43sem 7 50set correlation 44setcor 49setCor 4 38 41 44 46 47simmultilevel 37

                                                                    spider 13stars 14stats 28StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50Sweave 47

                                                                    table 19tail 49tetrachoric 6 34Thurstone 28 38 50Thurstone33 50topBottom 49

                                                                    vegetables 50 51violinBy 14 18vss 5 6

                                                                    weighted least squares 6withinBetween 37

                                                                    xtable 47

                                                                    62

                                                                    • Jump starting the psych packagendasha guide for the impatient
                                                                    • Psychometric functions are summarized in the second vignette
                                                                    • Overview of this and related documents
                                                                    • Getting started
                                                                    • Basic data analysis
                                                                      • Getting the data by using readfile
                                                                      • Data input from the clipboard
                                                                      • Basic descriptive statistics
                                                                        • Outlier detection using outlier
                                                                        • Basic data cleaning using scrub
                                                                        • Recoding categorical variables into dummy coded variables
                                                                          • Simple descriptive graphics
                                                                            • Scatter Plot Matrices
                                                                            • Density or violin plots
                                                                            • Means and error bars
                                                                            • Error bars for tabular data
                                                                            • Two dimensional displays of means and errors
                                                                            • Back to back histograms
                                                                            • Correlational structure
                                                                            • Heatmap displays of correlational structure
                                                                              • Testing correlations
                                                                              • Polychoric tetrachoric polyserial and biserial correlations
                                                                                • Multilevel modeling
                                                                                  • Decomposing data into within and between level correlations using statsBy
                                                                                  • Generating and displaying multilevel data
                                                                                  • Factor analysis by groups
                                                                                    • Multiple Regression mediation moderation and set correlations
                                                                                      • Multiple regression from data or correlation matrices
                                                                                      • Mediation and Moderation analysis
                                                                                      • Set Correlation
                                                                                        • Converting output to APA style tables using LaTeX
                                                                                        • Miscellaneous functions
                                                                                        • Data sets
                                                                                        • Development version and a users guide
                                                                                        • Psychometric Theory
                                                                                        • SessionInfo

                                                                      gt drawtetra()

                                                                      minus3 minus2 minus1 0 1 2 3

                                                                      minus3

                                                                      minus2

                                                                      minus1

                                                                      01

                                                                      23

                                                                      Y rho = 05phi = 033

                                                                      X gt τY gt Τ

                                                                      X lt τY gt Τ

                                                                      X gt τY lt Τ

                                                                      X lt τY lt Τ

                                                                      x

                                                                      dnor

                                                                      m(x

                                                                      )

                                                                      X gt τ

                                                                      τ

                                                                      x1

                                                                      Y gt Τ

                                                                      Τ

                                                                      Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values

                                                                      35

                                                                      gt drawcor(expand=20cuts=c(00))

                                                                      xy

                                                                      z

                                                                      Bivariate density rho = 05

                                                                      Figure 15 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values It isfound (laboriously) by optimizing the fit of the bivariate normal for various values of thecorrelation to the observed cell frequencies

                                                                      36

                                                                      is the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups

                                                                      41 Decomposing data into within and between level correlations usingstatsBy

                                                                      There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler functionto give some of the basic descriptive statistics for two level models

                                                                      This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package

                                                                      rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)

                                                                      where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means

                                                                      42 Generating and displaying multilevel data

                                                                      withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a betweengroup correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data

                                                                      simmultilevel will generate simulated data with a multilevel structure

                                                                      The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary

                                                                      function specifying the variable of interest

                                                                      37

                                                                      Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )

                                                                      43 Factor analysis by groups

                                                                      Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output

                                                                      sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)

                                                                      faBy(sbnfactors=5) find the 5 factor solution for each education level

                                                                      5 Multiple Regression mediation moderation and set cor-relations

                                                                      The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices

                                                                      51 Multiple regression from data or correlation matrices

                                                                      The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstonecorrelation matrix and find the multiple correlation of the last five variables as a functionof the first 4

                                                                      gt setCor(y = 59x=14data=Thurstone)

                                                                      Call setCor(y = 59 x = 14 data = Thurstone)

                                                                      Multiple Regression from matrix input

                                                                      Beta weights

                                                                      FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

                                                                      Sentences 009 007 025 021 020

                                                                      Vocabulary 009 017 009 016 -002

                                                                      SentCompletion 002 005 004 021 008

                                                                      FirstLetters 058 045 021 008 031

                                                                      38

                                                                      Multiple R

                                                                      FourLetterWords Suffixes LetterSeries Pedigrees

                                                                      069 063 050 058

                                                                      LetterGroup

                                                                      048

                                                                      multiple R2

                                                                      FourLetterWords Suffixes LetterSeries Pedigrees

                                                                      048 040 025 034

                                                                      LetterGroup

                                                                      023

                                                                      Multiple Inflation Factor (VIF) = 1(1-SMC) =

                                                                      Sentences Vocabulary SentCompletion FirstLetters

                                                                      369 388 300 135

                                                                      Unweighted multiple R

                                                                      FourLetterWords Suffixes LetterSeries Pedigrees

                                                                      059 058 049 058

                                                                      LetterGroup

                                                                      045

                                                                      Unweighted multiple R2

                                                                      FourLetterWords Suffixes LetterSeries Pedigrees

                                                                      034 034 024 033

                                                                      LetterGroup

                                                                      020

                                                                      Various estimates of between set correlations

                                                                      Squared Canonical Correlations

                                                                      [1] 06280 01478 00076 00049

                                                                      Average squared canonical correlation = 02

                                                                      Cohens Set Correlation R2 = 069

                                                                      Unweighted correlation between the two sets = 073

                                                                      By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed

                                                                      gt sc lt- setCor(y = 59x=34data=Thurstonez=12)

                                                                      Call setCor(y = 59 x = 34 data = Thurstone z = 12)

                                                                      Multiple Regression from matrix input

                                                                      Beta weights

                                                                      FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

                                                                      SentCompletion 002 005 004 021 008

                                                                      FirstLetters 058 045 021 008 031

                                                                      Multiple R

                                                                      FourLetterWords Suffixes LetterSeries Pedigrees

                                                                      058 046 021 018

                                                                      LetterGroup

                                                                      030

                                                                      39

                                                                      multiple R2

                                                                      FourLetterWords Suffixes LetterSeries Pedigrees

                                                                      0331 0210 0043 0032

                                                                      LetterGroup

                                                                      0092

                                                                      Multiple Inflation Factor (VIF) = 1(1-SMC) =

                                                                      SentCompletion FirstLetters

                                                                      102 102

                                                                      Unweighted multiple R

                                                                      FourLetterWords Suffixes LetterSeries Pedigrees

                                                                      044 035 017 014

                                                                      LetterGroup

                                                                      026

                                                                      Unweighted multiple R2

                                                                      FourLetterWords Suffixes LetterSeries Pedigrees

                                                                      019 012 003 002

                                                                      LetterGroup

                                                                      007

                                                                      Various estimates of between set correlations

                                                                      Squared Canonical Correlations

                                                                      [1] 0405 0023

                                                                      Average squared canonical correlation = 021

                                                                      Cohens Set Correlation R2 = 042

                                                                      Unweighted correlation between the two sets = 048

                                                                      gt round(sc$residual2)

                                                                      FourLetterWords Suffixes LetterSeries Pedigrees

                                                                      FourLetterWords 052 011 009 006

                                                                      Suffixes 011 060 -001 001

                                                                      LetterSeries 009 -001 075 028

                                                                      Pedigrees 006 001 028 066

                                                                      LetterGroup 013 003 037 020

                                                                      LetterGroup

                                                                      FourLetterWords 013

                                                                      Suffixes 003

                                                                      LetterSeries 037

                                                                      Pedigrees 020

                                                                      LetterGroup 077

                                                                      52 Mediation and Moderation analysis

                                                                      Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping

                                                                      40

                                                                      Consider the example from Preacher and Hayes (2004) as analyzed using the mediate

                                                                      function and the subsequent graphic from mediatediagram The data are found in theexample for mediate

                                                                      Call mediate(y = SATIS x = THERAPY m = ATTRIB data = sobel)

                                                                      The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB

                                                                      Total Direct effect(c) of THERAPY on SATIS = 076 SE = 031 t direct = 25 with probability = 0019

                                                                      Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t direct = 135 with probability = 019

                                                                      Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033

                                                                      Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 004 Upper CI = 069

                                                                      R2 of model = 031

                                                                      To see the longer output specify short = FALSE in the print statement

                                                                      Full output

                                                                      Total effect estimates (c)

                                                                      SATIS se t Prob

                                                                      THERAPY 076 031 25 00186

                                                                      Direct effect estimates (c)SATIS se t Prob

                                                                      THERAPY 043 032 135 0190

                                                                      ATTRIB 040 018 223 0034

                                                                      a effect estimates

                                                                      THERAPY se t Prob

                                                                      ATTRIB 082 03 274 00106

                                                                      b effect estimates

                                                                      SATIS se t Prob

                                                                      ATTRIB 04 018 223 0034

                                                                      ab effect estimates

                                                                      SATIS boot sd lower upper

                                                                      THERAPY 033 032 017 004 069

                                                                      bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables

                                                                      setCor(y = c( SATV SATQ) x = c(education age ) data = satact std=TRUE)

                                                                      bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap

                                                                      mediate(y = c( SATV ) x = c(education age ) m= ACT data =satactstd=TRUEniter=50)

                                                                      bull mediate will take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It then tests the mediation effect using a bootstrap The particular example is for demonstration purposes only and shows neithermoderation nor mediation The number of iterations for the boot strap was set to 50

                                                                      41

                                                                      gt mediatediagram(preacher)

                                                                      Mediation model

                                                                      THERAPY SATIS

                                                                      ATTRIB

                                                                      082

                                                                      c = 076

                                                                      c = 043

                                                                      04

                                                                      Figure 16 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram

                                                                      42

                                                                      gt preacher lt- setCor(1c(23)sobelstd=FALSE)

                                                                      gt setCordiagram(preacher)

                                                                      Regression Models

                                                                      THERAPY

                                                                      ATTRIB

                                                                      SATIS

                                                                      043

                                                                      04

                                                                      021

                                                                      Figure 17 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure

                                                                      43

                                                                      for speed The default number of boot straps is 5000

                                                                      53 Set Correlation

                                                                      An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor

                                                                      function Set correlation is

                                                                      R2 = 1minusn

                                                                      prodi=1

                                                                      (1minusλi)

                                                                      where λi is the ith eigen value of the eigen value decomposition of the matrix

                                                                      R = Rminus1xx RxyRminus1

                                                                      xx Rminus1xy

                                                                      Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate

                                                                      setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data

                                                                      Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things tonotice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied

                                                                      For this example the analysis is done on the correlation matrix rather than the rawdata

                                                                      gt C lt- cov(satactuse=pairwise)

                                                                      gt model1 lt- lm(ACT~ gender + education + age data=satact)

                                                                      gt summary(model1)

                                                                      Call

                                                                      lm(formula = ACT ~ gender + education + age data = satact)

                                                                      Residuals

                                                                      44

                                                                      Call mediate(y = c(SATQ) x = c(ACT) m = education data = satact

                                                                      mod = gender niter = 50 std = TRUE)

                                                                      The DV (Y) was SATQ The IV (X) was ACT gender ACTXgndr The mediating variable(s) = education

                                                                      Total Direct effect(c) of ACT on SATQ = 058 SE = 003 t direct = 1925 with probability = 0

                                                                      Direct effect (c) of ACT on SATQ removing education = 059 SE = 003 t direct = 1926 with probability = 0

                                                                      Indirect effect (ab) of ACT on SATQ through education = -001

                                                                      Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -002 Upper CI = 0

                                                                      Total Direct effect(c) of gender on SATQ = -014 SE = 003 t direct = -478 with probability = 21e-06

                                                                      Direct effect (c) of gender on NA removing education = -014 SE = 003 t direct = -463 with probability = 44e-06

                                                                      Indirect effect (ab) of gender on SATQ through education = 0

                                                                      Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -001 Upper CI = 0

                                                                      Total Direct effect(c) of ACTXgndr on SATQ = 0 SE = 003 t direct = 002 with probability = 099

                                                                      Direct effect (c) of ACTXgndr on NA removing education = 0 SE = 003 t direct = 001 with probability = 099

                                                                      Indirect effect (ab) of ACTXgndr on SATQ through education = 0

                                                                      Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = 0 Upper CI = 0

                                                                      R2 of model = 037

                                                                      To see the longer output specify short = FALSE in the print statement

                                                                      Full output

                                                                      Total effect estimates (c)

                                                                      SATQ se t Prob

                                                                      ACT 058 003 1925 000e+00

                                                                      gender -014 003 -478 210e-06

                                                                      ACTXgndr 000 003 002 985e-01

                                                                      Direct effect estimates (c)SATQ se t Prob

                                                                      ACT 059 003 1926 000e+00

                                                                      gender -014 003 -463 437e-06

                                                                      ACTXgndr 000 003 001 992e-01

                                                                      a effect estimates

                                                                      education se t Prob

                                                                      ACT 016 004 422 277e-05

                                                                      gender 009 004 250 128e-02

                                                                      ACTXgndr -001 004 -015 883e-01

                                                                      b effect estimates

                                                                      SATQ se t Prob

                                                                      education -004 003 -145 0147

                                                                      ab effect estimates

                                                                      SATQ boot sd lower upper

                                                                      ACT -001 -001 001 0 0

                                                                      gender 000 000 000 0 0

                                                                      ACTXgndr 000 000 000 0 0

                                                                      Moderation model

                                                                      ACT

                                                                      gender

                                                                      ACTXgndr

                                                                      SATQ

                                                                      education016 c = 058

                                                                      c = 059

                                                                      009 c = minus014

                                                                      c = minus014

                                                                      minus001 c = 0

                                                                      c = 0

                                                                      minus004

                                                                      minus004

                                                                      minus007

                                                                      002

                                                                      Figure 18 Moderated multiple regression requires the raw data

                                                                      45

                                                                      Min 1Q Median 3Q Max

                                                                      -252458 -32133 07769 35921 92630

                                                                      Coefficients

                                                                      Estimate Std Error t value Pr(gt|t|)

                                                                      (Intercept) 2741706 082140 33378 lt 2e-16

                                                                      gender -048606 037984 -1280 020110

                                                                      education 047890 015235 3143 000174

                                                                      age 001623 002278 0712 047650

                                                                      ---

                                                                      Signif codes 0 0001 001 005 01 1

                                                                      Residual standard error 4768 on 696 degrees of freedom

                                                                      Multiple R-squared 00272 Adjusted R-squared 002301

                                                                      F-statistic 6487 on 3 and 696 DF p-value 00002476

                                                                      Compare this with the output from setCor

                                                                      gt compare with sector

                                                                      gt setCor(c(46)c(13)C nobs=700)

                                                                      Call setCor(y = c(46) x = c(13) data = C nobs = 700)

                                                                      Multiple Regression from matrix input

                                                                      Beta weights

                                                                      ACT SATV SATQ

                                                                      gender -005 -003 -018

                                                                      education 014 010 010

                                                                      age 003 -010 -009

                                                                      Multiple R

                                                                      ACT SATV SATQ

                                                                      016 010 019

                                                                      multiple R2

                                                                      ACT SATV SATQ

                                                                      00272 00096 00359

                                                                      Multiple Inflation Factor (VIF) = 1(1-SMC) =

                                                                      gender education age

                                                                      101 145 144

                                                                      Unweighted multiple R

                                                                      ACT SATV SATQ

                                                                      015 005 011

                                                                      Unweighted multiple R2

                                                                      ACT SATV SATQ

                                                                      002 000 001

                                                                      SE of Beta weights

                                                                      ACT SATV SATQ

                                                                      gender 018 429 434

                                                                      education 022 513 518

                                                                      age 022 511 516

                                                                      t of Beta Weights

                                                                      ACT SATV SATQ

                                                                      gender -027 -001 -004

                                                                      education 065 002 002

                                                                      46

                                                                      age 015 -002 -002

                                                                      Probability of t lt

                                                                      ACT SATV SATQ

                                                                      gender 079 099 097

                                                                      education 051 098 098

                                                                      age 088 098 099

                                                                      Shrunken R2

                                                                      ACT SATV SATQ

                                                                      00230 00054 00317

                                                                      Standard Error of R2

                                                                      ACT SATV SATQ

                                                                      00120 00073 00137

                                                                      F

                                                                      ACT SATV SATQ

                                                                      649 226 863

                                                                      Probability of F lt

                                                                      ACT SATV SATQ

                                                                      248e-04 808e-02 124e-05

                                                                      degrees of freedom of regression

                                                                      [1] 3 696

                                                                      Various estimates of between set correlations

                                                                      Squared Canonical Correlations

                                                                      [1] 0050 0033 0008

                                                                      Chisq of canonical correlations

                                                                      [1] 358 231 56

                                                                      Average squared canonical correlation = 003

                                                                      Cohens Set Correlation R2 = 009

                                                                      Shrunken Set Correlation R2 = 008

                                                                      F and df of Cohens Set Correlation 726 9 168186

                                                                      Unweighted correlation between the two sets = 001

                                                                      Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

                                                                      6 Converting output to APA style tables using LATEX

                                                                      Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenient

                                                                      47

                                                                      LATEXoutput and finally df2latex converts a generic data frame to LATEX

                                                                      An example of converting the output from fa to LATEXappears in Table 2

                                                                      Table 2 fa2latexA factor analysis table from the psych package in R

                                                                      Variable MR1 MR2 MR3 h2 u2 com

                                                                      Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

                                                                      SS loadings 264 186 15

                                                                      MR1 100 059 054MR2 059 100 052MR3 054 052 100

                                                                      48

                                                                      7 Miscellaneous functions

                                                                      A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

                                                                      blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

                                                                      df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

                                                                      scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

                                                                      cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

                                                                      cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

                                                                      dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

                                                                      fisherz Convert a correlation to the corresponding Fisher z score

                                                                      geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

                                                                      ICC and cohenkappa are typically used to find the reliability for raters

                                                                      headtail combines the head and tail functions to show the first and last lines of a dataset or output

                                                                      topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

                                                                      mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

                                                                      prep finds the probability of replication for an F t or r and estimate effect size

                                                                      partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

                                                                      rangeCorrection will correct correlations for restriction of range

                                                                      reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

                                                                      49

                                                                      superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

                                                                      8 Data sets

                                                                      A number of data sets for demonstrating psychometric techniques are included in thepsych package These include six data sets showing a hierarchical factor structure (fivecognitive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2and one from health psychology Reise) One of these (Thurstone) is used as an examplein the sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) or 13 personality inventory scores (epibfi) and14 multiple choice iq items (iqitems) The vegetables example has paired comparisonpreferences for 9 vegetables This is an example of Thurstonian scaling used by Guilford(1954) and Nunnally (1967) Other data sets include cubits peas and heights fromGalton

                                                                      Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

                                                                      bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

                                                                      satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

                                                                      epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

                                                                      50

                                                                      iq 14 multiple choice ability items were included as part of the Synthetic Aperture Person-ality Assessment (SAPA) web based personality assessment project The data from1000 subjects are included here as a demonstration set for scoring multiple choiceinventories and doing basic item statistics

                                                                      galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to ntroduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

                                                                      Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

                                                                      miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

                                                                      9 Development version and a users guide

                                                                      The most recent development version is available as a source file at the repository main-tained at httppersonality-projectorgr That version will have removed the mostrecently discovered bugs (but perhaps introduced other yet to be discovered ones) Todownload that version go to the repository httppersonality-projectorgrsrc

                                                                      contrib and wander around For a Mac this version can be installed directly using theldquoother repositoryrdquo option in the package installer For a PC the zip file for the most recentrelease has been created using the win-builder facility at CRAN The development releasefor the Mac is usually several weeks ahead of the PC development version

                                                                      Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httppersonality-projectorgrpsych_manualpdf

                                                                      News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

                                                                      gt news(Version gt 170package=psych)

                                                                      51

                                                                      10 Psychometric Theory

                                                                      The psych package has been developed to help psychologists do basic research Many ofthe functions were developed to supplement a book (httppersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep)More information about the use of some of the functions may be found in the book

                                                                      For more extensive discussion of the use of psych in particular and R in general consulthttppersonality-projectorgrrguidehtml A short guide to R

                                                                      11 SessionInfo

                                                                      This document was prepared using the following settings

                                                                      gt sessionInfo()

                                                                      R Under development (unstable) (2017-03-05 r72309)

                                                                      Platform x86_64-apple-darwin1340 (64-bit)

                                                                      Running under macOS Sierra 10124

                                                                      Matrix products default

                                                                      BLAS LibraryFrameworksRframeworkVersions34ResourcesliblibRblas0dylib

                                                                      LAPACK LibraryFrameworksRframeworkVersions34ResourcesliblibRlapackdylib

                                                                      locale

                                                                      [1] C

                                                                      attached base packages

                                                                      [1] stats graphics grDevices utils datasets methods base

                                                                      other attached packages

                                                                      [1] psych_17421

                                                                      loaded via a namespace (and not attached)

                                                                      [1] compiler_340 parallel_340 tools_340 foreign_08-67

                                                                      [5] KernSmooth_223-15 nlme_31-131 mnormt_15-4 grid_340

                                                                      [9] lattice_020-34

                                                                      52

                                                                      References

                                                                      Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

                                                                      Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

                                                                      Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

                                                                      Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

                                                                      Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

                                                                      Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

                                                                      Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

                                                                      Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

                                                                      Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

                                                                      Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

                                                                      Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

                                                                      Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

                                                                      Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

                                                                      Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

                                                                      Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

                                                                      53

                                                                      Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

                                                                      Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

                                                                      Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

                                                                      Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

                                                                      Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

                                                                      Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

                                                                      Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

                                                                      Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

                                                                      Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

                                                                      Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

                                                                      MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

                                                                      Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

                                                                      McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

                                                                      Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

                                                                      Nunnally J C (1967) Psychometric theory McGraw-Hill New York

                                                                      54

                                                                      Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

                                                                      3rd edition

                                                                      Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

                                                                      Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

                                                                      Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

                                                                      Revelle W (2015) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 158

                                                                      Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

                                                                      Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

                                                                      Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

                                                                      Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

                                                                      Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook ofIndividual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

                                                                      Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

                                                                      Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

                                                                      Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

                                                                      Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bang

                                                                      55

                                                                      for the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

                                                                      Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

                                                                      Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

                                                                      Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

                                                                      Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

                                                                      Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

                                                                      Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

                                                                      Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

                                                                      Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

                                                                      Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

                                                                      Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

                                                                      Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

                                                                      56

                                                                      Index

                                                                      affect 14 24alpha 5 6

                                                                      Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26bifactor 6biserial 13 34blockrandom 49burt 34

                                                                      char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49circular statistics 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49ctv 7cubits 50

                                                                      densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49diagram 8drawcor 34drawtetra 34dummycode 13

                                                                      dynamite plot 19

                                                                      edit 3epibfi 50error bars 19errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23

                                                                      fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factor analysis 6factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49

                                                                      galton 51generalized least squares 6geometricmean 49GPArotation 7guttman 6

                                                                      harmonicmean 49head 49headtail 49heights 50hetdiagram 7Hmisc 28Holzinger 50

                                                                      57

                                                                      ICC 6 49iclust 6iclustdiagram 7Index 49introduction to psychometric theory with ap-

                                                                      plications in R 7iqitems 50irtfa 6 47irt2latex 47 49

                                                                      KnitR 47

                                                                      lavaan 38library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27lowess 14

                                                                      makekeys 14MAP 6mardia 49maximum likelihood 6mediate 4 41 42mediatediagram 41minimum residual 6mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevel 37multilevelreliability 6multiple regression 38

                                                                      nfactors 6nlme 37

                                                                      omega 6 7outlier 3 11 12

                                                                      padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7principal axis 6psych 3 5ndash8 28 47 49ndash52

                                                                      R functionaffect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50

                                                                      58

                                                                      densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13edit 3epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49head 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47

                                                                      irt2latex 47 49library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7psych 51psych package

                                                                      affect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26

                                                                      59

                                                                      biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49corsmooth 34cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7

                                                                      fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47irt2latex 47 49lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12prep 49pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34

                                                                      60

                                                                      polyserial 34principal 5ndash7psych 51rtest 28rangeCorrection 49readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

                                                                      rtest 28

                                                                      rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50table 19tail 49tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

                                                                      R package

                                                                      61

                                                                      ctv 7GPArotation 7Hmisc 28KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash8 28 47 49ndash52Rgraphviz 8sem 7 50stats 28Sweave 47xtable 47

                                                                      rtest 28rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49Rgraphviz 8

                                                                      SAPA 26 50 51satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43sem 7 50set correlation 44setcor 49setCor 4 38 41 44 46 47simmultilevel 37

                                                                      spider 13stars 14stats 28StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50Sweave 47

                                                                      table 19tail 49tetrachoric 6 34Thurstone 28 38 50Thurstone33 50topBottom 49

                                                                      vegetables 50 51violinBy 14 18vss 5 6

                                                                      weighted least squares 6withinBetween 37

                                                                      xtable 47

                                                                      62

                                                                      • Jump starting the psych packagendasha guide for the impatient
                                                                      • Psychometric functions are summarized in the second vignette
                                                                      • Overview of this and related documents
                                                                      • Getting started
                                                                      • Basic data analysis
                                                                        • Getting the data by using readfile
                                                                        • Data input from the clipboard
                                                                        • Basic descriptive statistics
                                                                          • Outlier detection using outlier
                                                                          • Basic data cleaning using scrub
                                                                          • Recoding categorical variables into dummy coded variables
                                                                            • Simple descriptive graphics
                                                                              • Scatter Plot Matrices
                                                                              • Density or violin plots
                                                                              • Means and error bars
                                                                              • Error bars for tabular data
                                                                              • Two dimensional displays of means and errors
                                                                              • Back to back histograms
                                                                              • Correlational structure
                                                                              • Heatmap displays of correlational structure
                                                                                • Testing correlations
                                                                                • Polychoric tetrachoric polyserial and biserial correlations
                                                                                  • Multilevel modeling
                                                                                    • Decomposing data into within and between level correlations using statsBy
                                                                                    • Generating and displaying multilevel data
                                                                                    • Factor analysis by groups
                                                                                      • Multiple Regression mediation moderation and set correlations
                                                                                        • Multiple regression from data or correlation matrices
                                                                                        • Mediation and Moderation analysis
                                                                                        • Set Correlation
                                                                                          • Converting output to APA style tables using LaTeX
                                                                                          • Miscellaneous functions
                                                                                          • Data sets
                                                                                          • Development version and a users guide
                                                                                          • Psychometric Theory
                                                                                          • SessionInfo

                                                                        gt drawcor(expand=20cuts=c(00))

                                                                        xy

                                                                        z

                                                                        Bivariate density rho = 05

                                                                        Figure 15 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values It isfound (laboriously) by optimizing the fit of the bivariate normal for various values of thecorrelation to the observed cell frequencies

                                                                        36

                                                                        is the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups

                                                                        41 Decomposing data into within and between level correlations usingstatsBy

                                                                        There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler functionto give some of the basic descriptive statistics for two level models

                                                                        This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package

                                                                        rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)

                                                                        where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means

                                                                        42 Generating and displaying multilevel data

                                                                        withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a betweengroup correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data

                                                                        simmultilevel will generate simulated data with a multilevel structure

                                                                        The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary

                                                                        function specifying the variable of interest

                                                                        37

                                                                        Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )

                                                                        43 Factor analysis by groups

                                                                        Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output

                                                                        sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)

                                                                        faBy(sbnfactors=5) find the 5 factor solution for each education level

                                                                        5 Multiple Regression mediation moderation and set cor-relations

                                                                        The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices

                                                                        51 Multiple regression from data or correlation matrices

                                                                        The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstonecorrelation matrix and find the multiple correlation of the last five variables as a functionof the first 4

                                                                        gt setCor(y = 59x=14data=Thurstone)

                                                                        Call setCor(y = 59 x = 14 data = Thurstone)

                                                                        Multiple Regression from matrix input

                                                                        Beta weights

                                                                        FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

                                                                        Sentences 009 007 025 021 020

                                                                        Vocabulary 009 017 009 016 -002

                                                                        SentCompletion 002 005 004 021 008

                                                                        FirstLetters 058 045 021 008 031

                                                                        38

                                                                        Multiple R

                                                                        FourLetterWords Suffixes LetterSeries Pedigrees

                                                                        069 063 050 058

                                                                        LetterGroup

                                                                        048

                                                                        multiple R2

                                                                        FourLetterWords Suffixes LetterSeries Pedigrees

                                                                        048 040 025 034

                                                                        LetterGroup

                                                                        023

                                                                        Multiple Inflation Factor (VIF) = 1(1-SMC) =

                                                                        Sentences Vocabulary SentCompletion FirstLetters

                                                                        369 388 300 135

                                                                        Unweighted multiple R

                                                                        FourLetterWords Suffixes LetterSeries Pedigrees

                                                                        059 058 049 058

                                                                        LetterGroup

                                                                        045

                                                                        Unweighted multiple R2

                                                                        FourLetterWords Suffixes LetterSeries Pedigrees

                                                                        034 034 024 033

                                                                        LetterGroup

                                                                        020

                                                                        Various estimates of between set correlations

                                                                        Squared Canonical Correlations

                                                                        [1] 06280 01478 00076 00049

                                                                        Average squared canonical correlation = 02

                                                                        Cohens Set Correlation R2 = 069

                                                                        Unweighted correlation between the two sets = 073

                                                                        By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed

                                                                        gt sc lt- setCor(y = 59x=34data=Thurstonez=12)

                                                                        Call setCor(y = 59 x = 34 data = Thurstone z = 12)

                                                                        Multiple Regression from matrix input

                                                                        Beta weights

                                                                        FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

                                                                        SentCompletion 002 005 004 021 008

                                                                        FirstLetters 058 045 021 008 031

                                                                        Multiple R

                                                                        FourLetterWords Suffixes LetterSeries Pedigrees

                                                                        058 046 021 018

                                                                        LetterGroup

                                                                        030

                                                                        39

                                                                        multiple R2

                                                                        FourLetterWords Suffixes LetterSeries Pedigrees

                                                                        0331 0210 0043 0032

                                                                        LetterGroup

                                                                        0092

                                                                        Multiple Inflation Factor (VIF) = 1(1-SMC) =

                                                                        SentCompletion FirstLetters

                                                                        102 102

                                                                        Unweighted multiple R

                                                                        FourLetterWords Suffixes LetterSeries Pedigrees

                                                                        044 035 017 014

                                                                        LetterGroup

                                                                        026

                                                                        Unweighted multiple R2

                                                                        FourLetterWords Suffixes LetterSeries Pedigrees

                                                                        019 012 003 002

                                                                        LetterGroup

                                                                        007

                                                                        Various estimates of between set correlations

                                                                        Squared Canonical Correlations

                                                                        [1] 0405 0023

                                                                        Average squared canonical correlation = 021

                                                                        Cohens Set Correlation R2 = 042

                                                                        Unweighted correlation between the two sets = 048

                                                                        gt round(sc$residual2)

                                                                        FourLetterWords Suffixes LetterSeries Pedigrees

                                                                        FourLetterWords 052 011 009 006

                                                                        Suffixes 011 060 -001 001

                                                                        LetterSeries 009 -001 075 028

                                                                        Pedigrees 006 001 028 066

                                                                        LetterGroup 013 003 037 020

                                                                        LetterGroup

                                                                        FourLetterWords 013

                                                                        Suffixes 003

                                                                        LetterSeries 037

                                                                        Pedigrees 020

                                                                        LetterGroup 077

                                                                        52 Mediation and Moderation analysis

                                                                        Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping

                                                                        40

                                                                        Consider the example from Preacher and Hayes (2004) as analyzed using the mediate

                                                                        function and the subsequent graphic from mediatediagram The data are found in theexample for mediate

                                                                        Call mediate(y = SATIS x = THERAPY m = ATTRIB data = sobel)

                                                                        The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB

                                                                        Total Direct effect(c) of THERAPY on SATIS = 076 SE = 031 t direct = 25 with probability = 0019

                                                                        Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t direct = 135 with probability = 019

                                                                        Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033

                                                                        Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 004 Upper CI = 069

                                                                        R2 of model = 031

                                                                        To see the longer output specify short = FALSE in the print statement

                                                                        Full output

                                                                        Total effect estimates (c)

                                                                        SATIS se t Prob

                                                                        THERAPY 076 031 25 00186

                                                                        Direct effect estimates (c)SATIS se t Prob

                                                                        THERAPY 043 032 135 0190

                                                                        ATTRIB 040 018 223 0034

                                                                        a effect estimates

                                                                        THERAPY se t Prob

                                                                        ATTRIB 082 03 274 00106

                                                                        b effect estimates

                                                                        SATIS se t Prob

                                                                        ATTRIB 04 018 223 0034

                                                                        ab effect estimates

                                                                        SATIS boot sd lower upper

                                                                        THERAPY 033 032 017 004 069

                                                                        bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables

                                                                        setCor(y = c( SATV SATQ) x = c(education age ) data = satact std=TRUE)

                                                                        bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap

                                                                        mediate(y = c( SATV ) x = c(education age ) m= ACT data =satactstd=TRUEniter=50)

                                                                        bull mediate will take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It then tests the mediation effect using a bootstrap The particular example is for demonstration purposes only and shows neithermoderation nor mediation The number of iterations for the boot strap was set to 50

                                                                        41

                                                                        gt mediatediagram(preacher)

                                                                        Mediation model

                                                                        THERAPY SATIS

                                                                        ATTRIB

                                                                        082

                                                                        c = 076

                                                                        c = 043

                                                                        04

                                                                        Figure 16 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram

                                                                        42

                                                                        gt preacher lt- setCor(1c(23)sobelstd=FALSE)

                                                                        gt setCordiagram(preacher)

                                                                        Regression Models

                                                                        THERAPY

                                                                        ATTRIB

                                                                        SATIS

                                                                        043

                                                                        04

                                                                        021

                                                                        Figure 17 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure

                                                                        43

                                                                        for speed The default number of boot straps is 5000

                                                                        53 Set Correlation

                                                                        An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor

                                                                        function Set correlation is

                                                                        R2 = 1minusn

                                                                        prodi=1

                                                                        (1minusλi)

                                                                        where λi is the ith eigen value of the eigen value decomposition of the matrix

                                                                        R = Rminus1xx RxyRminus1

                                                                        xx Rminus1xy

                                                                        Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate

                                                                        setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data

                                                                        Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things tonotice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied

                                                                        For this example the analysis is done on the correlation matrix rather than the rawdata

                                                                        gt C lt- cov(satactuse=pairwise)

                                                                        gt model1 lt- lm(ACT~ gender + education + age data=satact)

                                                                        gt summary(model1)

                                                                        Call

                                                                        lm(formula = ACT ~ gender + education + age data = satact)

                                                                        Residuals

                                                                        44

                                                                        Call mediate(y = c(SATQ) x = c(ACT) m = education data = satact

                                                                        mod = gender niter = 50 std = TRUE)

                                                                        The DV (Y) was SATQ The IV (X) was ACT gender ACTXgndr The mediating variable(s) = education

                                                                        Total Direct effect(c) of ACT on SATQ = 058 SE = 003 t direct = 1925 with probability = 0

                                                                        Direct effect (c) of ACT on SATQ removing education = 059 SE = 003 t direct = 1926 with probability = 0

                                                                        Indirect effect (ab) of ACT on SATQ through education = -001

                                                                        Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -002 Upper CI = 0

                                                                        Total Direct effect(c) of gender on SATQ = -014 SE = 003 t direct = -478 with probability = 21e-06

                                                                        Direct effect (c) of gender on NA removing education = -014 SE = 003 t direct = -463 with probability = 44e-06

                                                                        Indirect effect (ab) of gender on SATQ through education = 0

                                                                        Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -001 Upper CI = 0

                                                                        Total Direct effect(c) of ACTXgndr on SATQ = 0 SE = 003 t direct = 002 with probability = 099

                                                                        Direct effect (c) of ACTXgndr on NA removing education = 0 SE = 003 t direct = 001 with probability = 099

                                                                        Indirect effect (ab) of ACTXgndr on SATQ through education = 0

                                                                        Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = 0 Upper CI = 0

                                                                        R2 of model = 037

                                                                        To see the longer output specify short = FALSE in the print statement

                                                                        Full output

                                                                        Total effect estimates (c)

                                                                        SATQ se t Prob

                                                                        ACT 058 003 1925 000e+00

                                                                        gender -014 003 -478 210e-06

                                                                        ACTXgndr 000 003 002 985e-01

                                                                        Direct effect estimates (c)SATQ se t Prob

                                                                        ACT 059 003 1926 000e+00

                                                                        gender -014 003 -463 437e-06

                                                                        ACTXgndr 000 003 001 992e-01

                                                                        a effect estimates

                                                                        education se t Prob

                                                                        ACT 016 004 422 277e-05

                                                                        gender 009 004 250 128e-02

                                                                        ACTXgndr -001 004 -015 883e-01

                                                                        b effect estimates

                                                                        SATQ se t Prob

                                                                        education -004 003 -145 0147

                                                                        ab effect estimates

                                                                        SATQ boot sd lower upper

                                                                        ACT -001 -001 001 0 0

                                                                        gender 000 000 000 0 0

                                                                        ACTXgndr 000 000 000 0 0

                                                                        Moderation model

                                                                        ACT

                                                                        gender

                                                                        ACTXgndr

                                                                        SATQ

                                                                        education016 c = 058

                                                                        c = 059

                                                                        009 c = minus014

                                                                        c = minus014

                                                                        minus001 c = 0

                                                                        c = 0

                                                                        minus004

                                                                        minus004

                                                                        minus007

                                                                        002

                                                                        Figure 18 Moderated multiple regression requires the raw data

                                                                        45

                                                                        Min 1Q Median 3Q Max

                                                                        -252458 -32133 07769 35921 92630

                                                                        Coefficients

                                                                        Estimate Std Error t value Pr(gt|t|)

                                                                        (Intercept) 2741706 082140 33378 lt 2e-16

                                                                        gender -048606 037984 -1280 020110

                                                                        education 047890 015235 3143 000174

                                                                        age 001623 002278 0712 047650

                                                                        ---

                                                                        Signif codes 0 0001 001 005 01 1

                                                                        Residual standard error 4768 on 696 degrees of freedom

                                                                        Multiple R-squared 00272 Adjusted R-squared 002301

                                                                        F-statistic 6487 on 3 and 696 DF p-value 00002476

                                                                        Compare this with the output from setCor

                                                                        gt compare with sector

                                                                        gt setCor(c(46)c(13)C nobs=700)

                                                                        Call setCor(y = c(46) x = c(13) data = C nobs = 700)

                                                                        Multiple Regression from matrix input

                                                                        Beta weights

                                                                        ACT SATV SATQ

                                                                        gender -005 -003 -018

                                                                        education 014 010 010

                                                                        age 003 -010 -009

                                                                        Multiple R

                                                                        ACT SATV SATQ

                                                                        016 010 019

                                                                        multiple R2

                                                                        ACT SATV SATQ

                                                                        00272 00096 00359

                                                                        Multiple Inflation Factor (VIF) = 1(1-SMC) =

                                                                        gender education age

                                                                        101 145 144

                                                                        Unweighted multiple R

                                                                        ACT SATV SATQ

                                                                        015 005 011

                                                                        Unweighted multiple R2

                                                                        ACT SATV SATQ

                                                                        002 000 001

                                                                        SE of Beta weights

                                                                        ACT SATV SATQ

                                                                        gender 018 429 434

                                                                        education 022 513 518

                                                                        age 022 511 516

                                                                        t of Beta Weights

                                                                        ACT SATV SATQ

                                                                        gender -027 -001 -004

                                                                        education 065 002 002

                                                                        46

                                                                        age 015 -002 -002

                                                                        Probability of t lt

                                                                        ACT SATV SATQ

                                                                        gender 079 099 097

                                                                        education 051 098 098

                                                                        age 088 098 099

                                                                        Shrunken R2

                                                                        ACT SATV SATQ

                                                                        00230 00054 00317

                                                                        Standard Error of R2

                                                                        ACT SATV SATQ

                                                                        00120 00073 00137

                                                                        F

                                                                        ACT SATV SATQ

                                                                        649 226 863

                                                                        Probability of F lt

                                                                        ACT SATV SATQ

                                                                        248e-04 808e-02 124e-05

                                                                        degrees of freedom of regression

                                                                        [1] 3 696

                                                                        Various estimates of between set correlations

                                                                        Squared Canonical Correlations

                                                                        [1] 0050 0033 0008

                                                                        Chisq of canonical correlations

                                                                        [1] 358 231 56

                                                                        Average squared canonical correlation = 003

                                                                        Cohens Set Correlation R2 = 009

                                                                        Shrunken Set Correlation R2 = 008

                                                                        F and df of Cohens Set Correlation 726 9 168186

                                                                        Unweighted correlation between the two sets = 001

                                                                        Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

                                                                        6 Converting output to APA style tables using LATEX

                                                                        Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenient

                                                                        47

                                                                        LATEXoutput and finally df2latex converts a generic data frame to LATEX

                                                                        An example of converting the output from fa to LATEXappears in Table 2

                                                                        Table 2 fa2latexA factor analysis table from the psych package in R

                                                                        Variable MR1 MR2 MR3 h2 u2 com

                                                                        Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

                                                                        SS loadings 264 186 15

                                                                        MR1 100 059 054MR2 059 100 052MR3 054 052 100

                                                                        48

                                                                        7 Miscellaneous functions

                                                                        A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

                                                                        blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

                                                                        df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

                                                                        scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

                                                                        cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

                                                                        cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

                                                                        dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

                                                                        fisherz Convert a correlation to the corresponding Fisher z score

                                                                        geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

                                                                        ICC and cohenkappa are typically used to find the reliability for raters

                                                                        headtail combines the head and tail functions to show the first and last lines of a dataset or output

                                                                        topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

                                                                        mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

                                                                        prep finds the probability of replication for an F t or r and estimate effect size

                                                                        partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

                                                                        rangeCorrection will correct correlations for restriction of range

                                                                        reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

                                                                        49

                                                                        superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

                                                                        8 Data sets

                                                                        A number of data sets for demonstrating psychometric techniques are included in thepsych package These include six data sets showing a hierarchical factor structure (fivecognitive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2and one from health psychology Reise) One of these (Thurstone) is used as an examplein the sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) or 13 personality inventory scores (epibfi) and14 multiple choice iq items (iqitems) The vegetables example has paired comparisonpreferences for 9 vegetables This is an example of Thurstonian scaling used by Guilford(1954) and Nunnally (1967) Other data sets include cubits peas and heights fromGalton

                                                                        Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

                                                                        bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

                                                                        satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

                                                                        epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

                                                                        50

                                                                        iq 14 multiple choice ability items were included as part of the Synthetic Aperture Person-ality Assessment (SAPA) web based personality assessment project The data from1000 subjects are included here as a demonstration set for scoring multiple choiceinventories and doing basic item statistics

                                                                        galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to ntroduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

                                                                        Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

                                                                        miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

                                                                        9 Development version and a users guide

                                                                        The most recent development version is available as a source file at the repository main-tained at httppersonality-projectorgr That version will have removed the mostrecently discovered bugs (but perhaps introduced other yet to be discovered ones) Todownload that version go to the repository httppersonality-projectorgrsrc

                                                                        contrib and wander around For a Mac this version can be installed directly using theldquoother repositoryrdquo option in the package installer For a PC the zip file for the most recentrelease has been created using the win-builder facility at CRAN The development releasefor the Mac is usually several weeks ahead of the PC development version

                                                                        Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httppersonality-projectorgrpsych_manualpdf

                                                                        News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

                                                                        gt news(Version gt 170package=psych)

                                                                        51

                                                                        10 Psychometric Theory

                                                                        The psych package has been developed to help psychologists do basic research Many ofthe functions were developed to supplement a book (httppersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep)More information about the use of some of the functions may be found in the book

                                                                        For more extensive discussion of the use of psych in particular and R in general consulthttppersonality-projectorgrrguidehtml A short guide to R

                                                                        11 SessionInfo

                                                                        This document was prepared using the following settings

                                                                        gt sessionInfo()

                                                                        R Under development (unstable) (2017-03-05 r72309)

                                                                        Platform x86_64-apple-darwin1340 (64-bit)

                                                                        Running under macOS Sierra 10124

                                                                        Matrix products default

                                                                        BLAS LibraryFrameworksRframeworkVersions34ResourcesliblibRblas0dylib

                                                                        LAPACK LibraryFrameworksRframeworkVersions34ResourcesliblibRlapackdylib

                                                                        locale

                                                                        [1] C

                                                                        attached base packages

                                                                        [1] stats graphics grDevices utils datasets methods base

                                                                        other attached packages

                                                                        [1] psych_17421

                                                                        loaded via a namespace (and not attached)

                                                                        [1] compiler_340 parallel_340 tools_340 foreign_08-67

                                                                        [5] KernSmooth_223-15 nlme_31-131 mnormt_15-4 grid_340

                                                                        [9] lattice_020-34

                                                                        52

                                                                        References

                                                                        Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

                                                                        Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

                                                                        Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

                                                                        Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

                                                                        Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

                                                                        Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

                                                                        Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

                                                                        Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

                                                                        Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

                                                                        Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

                                                                        Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

                                                                        Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

                                                                        Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

                                                                        Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

                                                                        Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

                                                                        53

                                                                        Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

                                                                        Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

                                                                        Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

                                                                        Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

                                                                        Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

                                                                        Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

                                                                        Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

                                                                        Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

                                                                        Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

                                                                        Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

                                                                        MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

                                                                        Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

                                                                        McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

                                                                        Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

                                                                        Nunnally J C (1967) Psychometric theory McGraw-Hill New York

                                                                        54

                                                                        Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

                                                                        3rd edition

                                                                        Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

                                                                        Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

                                                                        Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

                                                                        Revelle W (2015) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 158

                                                                        Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

                                                                        Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

                                                                        Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

                                                                        Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

                                                                        Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook ofIndividual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

                                                                        Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

                                                                        Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

                                                                        Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

                                                                        Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bang

                                                                        55

                                                                        for the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

                                                                        Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

                                                                        Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

                                                                        Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

                                                                        Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

                                                                        Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

                                                                        Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

                                                                        Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

                                                                        Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

                                                                        Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

                                                                        Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

                                                                        Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

                                                                        56

                                                                        Index

                                                                        affect 14 24alpha 5 6

                                                                        Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26bifactor 6biserial 13 34blockrandom 49burt 34

                                                                        char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49circular statistics 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49ctv 7cubits 50

                                                                        densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49diagram 8drawcor 34drawtetra 34dummycode 13

                                                                        dynamite plot 19

                                                                        edit 3epibfi 50error bars 19errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23

                                                                        fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factor analysis 6factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49

                                                                        galton 51generalized least squares 6geometricmean 49GPArotation 7guttman 6

                                                                        harmonicmean 49head 49headtail 49heights 50hetdiagram 7Hmisc 28Holzinger 50

                                                                        57

                                                                        ICC 6 49iclust 6iclustdiagram 7Index 49introduction to psychometric theory with ap-

                                                                        plications in R 7iqitems 50irtfa 6 47irt2latex 47 49

                                                                        KnitR 47

                                                                        lavaan 38library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27lowess 14

                                                                        makekeys 14MAP 6mardia 49maximum likelihood 6mediate 4 41 42mediatediagram 41minimum residual 6mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevel 37multilevelreliability 6multiple regression 38

                                                                        nfactors 6nlme 37

                                                                        omega 6 7outlier 3 11 12

                                                                        padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7principal axis 6psych 3 5ndash8 28 47 49ndash52

                                                                        R functionaffect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50

                                                                        58

                                                                        densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13edit 3epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49head 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47

                                                                        irt2latex 47 49library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7psych 51psych package

                                                                        affect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26

                                                                        59

                                                                        biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49corsmooth 34cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7

                                                                        fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47irt2latex 47 49lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12prep 49pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34

                                                                        60

                                                                        polyserial 34principal 5ndash7psych 51rtest 28rangeCorrection 49readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

                                                                        rtest 28

                                                                        rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50table 19tail 49tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

                                                                        R package

                                                                        61

                                                                        ctv 7GPArotation 7Hmisc 28KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash8 28 47 49ndash52Rgraphviz 8sem 7 50stats 28Sweave 47xtable 47

                                                                        rtest 28rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49Rgraphviz 8

                                                                        SAPA 26 50 51satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43sem 7 50set correlation 44setcor 49setCor 4 38 41 44 46 47simmultilevel 37

                                                                        spider 13stars 14stats 28StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50Sweave 47

                                                                        table 19tail 49tetrachoric 6 34Thurstone 28 38 50Thurstone33 50topBottom 49

                                                                        vegetables 50 51violinBy 14 18vss 5 6

                                                                        weighted least squares 6withinBetween 37

                                                                        xtable 47

                                                                        62

                                                                        • Jump starting the psych packagendasha guide for the impatient
                                                                        • Psychometric functions are summarized in the second vignette
                                                                        • Overview of this and related documents
                                                                        • Getting started
                                                                        • Basic data analysis
                                                                          • Getting the data by using readfile
                                                                          • Data input from the clipboard
                                                                          • Basic descriptive statistics
                                                                            • Outlier detection using outlier
                                                                            • Basic data cleaning using scrub
                                                                            • Recoding categorical variables into dummy coded variables
                                                                              • Simple descriptive graphics
                                                                                • Scatter Plot Matrices
                                                                                • Density or violin plots
                                                                                • Means and error bars
                                                                                • Error bars for tabular data
                                                                                • Two dimensional displays of means and errors
                                                                                • Back to back histograms
                                                                                • Correlational structure
                                                                                • Heatmap displays of correlational structure
                                                                                  • Testing correlations
                                                                                  • Polychoric tetrachoric polyserial and biserial correlations
                                                                                    • Multilevel modeling
                                                                                      • Decomposing data into within and between level correlations using statsBy
                                                                                      • Generating and displaying multilevel data
                                                                                      • Factor analysis by groups
                                                                                        • Multiple Regression mediation moderation and set correlations
                                                                                          • Multiple regression from data or correlation matrices
                                                                                          • Mediation and Moderation analysis
                                                                                          • Set Correlation
                                                                                            • Converting output to APA style tables using LaTeX
                                                                                            • Miscellaneous functions
                                                                                            • Data sets
                                                                                            • Development version and a users guide
                                                                                            • Psychometric Theory
                                                                                            • SessionInfo

                                                                          is the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups

                                                                          41 Decomposing data into within and between level correlations usingstatsBy

                                                                          There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler functionto give some of the basic descriptive statistics for two level models

                                                                          This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package

                                                                          rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)

                                                                          where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means

                                                                          42 Generating and displaying multilevel data

                                                                          withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a betweengroup correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data

                                                                          simmultilevel will generate simulated data with a multilevel structure

                                                                          The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary

                                                                          function specifying the variable of interest

                                                                          37

                                                                          Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )

                                                                          43 Factor analysis by groups

                                                                          Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output

                                                                          sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)

                                                                          faBy(sbnfactors=5) find the 5 factor solution for each education level

                                                                          5 Multiple Regression mediation moderation and set cor-relations

                                                                          The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices

                                                                          51 Multiple regression from data or correlation matrices

                                                                          The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstonecorrelation matrix and find the multiple correlation of the last five variables as a functionof the first 4

                                                                          gt setCor(y = 59x=14data=Thurstone)

                                                                          Call setCor(y = 59 x = 14 data = Thurstone)

                                                                          Multiple Regression from matrix input

                                                                          Beta weights

                                                                          FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

                                                                          Sentences 009 007 025 021 020

                                                                          Vocabulary 009 017 009 016 -002

                                                                          SentCompletion 002 005 004 021 008

                                                                          FirstLetters 058 045 021 008 031

                                                                          38

                                                                          Multiple R

                                                                          FourLetterWords Suffixes LetterSeries Pedigrees

                                                                          069 063 050 058

                                                                          LetterGroup

                                                                          048

                                                                          multiple R2

                                                                          FourLetterWords Suffixes LetterSeries Pedigrees

                                                                          048 040 025 034

                                                                          LetterGroup

                                                                          023

                                                                          Multiple Inflation Factor (VIF) = 1(1-SMC) =

                                                                          Sentences Vocabulary SentCompletion FirstLetters

                                                                          369 388 300 135

                                                                          Unweighted multiple R

                                                                          FourLetterWords Suffixes LetterSeries Pedigrees

                                                                          059 058 049 058

                                                                          LetterGroup

                                                                          045

                                                                          Unweighted multiple R2

                                                                          FourLetterWords Suffixes LetterSeries Pedigrees

                                                                          034 034 024 033

                                                                          LetterGroup

                                                                          020

                                                                          Various estimates of between set correlations

                                                                          Squared Canonical Correlations

                                                                          [1] 06280 01478 00076 00049

                                                                          Average squared canonical correlation = 02

                                                                          Cohens Set Correlation R2 = 069

                                                                          Unweighted correlation between the two sets = 073

                                                                          By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed

                                                                          gt sc lt- setCor(y = 59x=34data=Thurstonez=12)

                                                                          Call setCor(y = 59 x = 34 data = Thurstone z = 12)

                                                                          Multiple Regression from matrix input

                                                                          Beta weights

                                                                          FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

                                                                          SentCompletion 002 005 004 021 008

                                                                          FirstLetters 058 045 021 008 031

                                                                          Multiple R

                                                                          FourLetterWords Suffixes LetterSeries Pedigrees

                                                                          058 046 021 018

                                                                          LetterGroup

                                                                          030

                                                                          39

                                                                          multiple R2

                                                                          FourLetterWords Suffixes LetterSeries Pedigrees

                                                                          0331 0210 0043 0032

                                                                          LetterGroup

                                                                          0092

                                                                          Multiple Inflation Factor (VIF) = 1(1-SMC) =

                                                                          SentCompletion FirstLetters

                                                                          102 102

                                                                          Unweighted multiple R

                                                                          FourLetterWords Suffixes LetterSeries Pedigrees

                                                                          044 035 017 014

                                                                          LetterGroup

                                                                          026

                                                                          Unweighted multiple R2

                                                                          FourLetterWords Suffixes LetterSeries Pedigrees

                                                                          019 012 003 002

                                                                          LetterGroup

                                                                          007

                                                                          Various estimates of between set correlations

                                                                          Squared Canonical Correlations

                                                                          [1] 0405 0023

                                                                          Average squared canonical correlation = 021

                                                                          Cohens Set Correlation R2 = 042

                                                                          Unweighted correlation between the two sets = 048

                                                                          gt round(sc$residual2)

                                                                          FourLetterWords Suffixes LetterSeries Pedigrees

                                                                          FourLetterWords 052 011 009 006

                                                                          Suffixes 011 060 -001 001

                                                                          LetterSeries 009 -001 075 028

                                                                          Pedigrees 006 001 028 066

                                                                          LetterGroup 013 003 037 020

                                                                          LetterGroup

                                                                          FourLetterWords 013

                                                                          Suffixes 003

                                                                          LetterSeries 037

                                                                          Pedigrees 020

                                                                          LetterGroup 077

                                                                          52 Mediation and Moderation analysis

                                                                          Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping

                                                                          40

                                                                          Consider the example from Preacher and Hayes (2004) as analyzed using the mediate

                                                                          function and the subsequent graphic from mediatediagram The data are found in theexample for mediate

                                                                          Call mediate(y = SATIS x = THERAPY m = ATTRIB data = sobel)

                                                                          The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB

                                                                          Total Direct effect(c) of THERAPY on SATIS = 076 SE = 031 t direct = 25 with probability = 0019

                                                                          Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t direct = 135 with probability = 019

                                                                          Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033

                                                                          Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 004 Upper CI = 069

                                                                          R2 of model = 031

                                                                          To see the longer output specify short = FALSE in the print statement

                                                                          Full output

                                                                          Total effect estimates (c)

                                                                          SATIS se t Prob

                                                                          THERAPY 076 031 25 00186

                                                                          Direct effect estimates (c)SATIS se t Prob

                                                                          THERAPY 043 032 135 0190

                                                                          ATTRIB 040 018 223 0034

                                                                          a effect estimates

                                                                          THERAPY se t Prob

                                                                          ATTRIB 082 03 274 00106

                                                                          b effect estimates

                                                                          SATIS se t Prob

                                                                          ATTRIB 04 018 223 0034

                                                                          ab effect estimates

                                                                          SATIS boot sd lower upper

                                                                          THERAPY 033 032 017 004 069

                                                                          bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables

                                                                          setCor(y = c( SATV SATQ) x = c(education age ) data = satact std=TRUE)

                                                                          bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap

                                                                          mediate(y = c( SATV ) x = c(education age ) m= ACT data =satactstd=TRUEniter=50)

                                                                          bull mediate will take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It then tests the mediation effect using a bootstrap The particular example is for demonstration purposes only and shows neithermoderation nor mediation The number of iterations for the boot strap was set to 50

                                                                          41

                                                                          gt mediatediagram(preacher)

                                                                          Mediation model

                                                                          THERAPY SATIS

                                                                          ATTRIB

                                                                          082

                                                                          c = 076

                                                                          c = 043

                                                                          04

                                                                          Figure 16 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram

                                                                          42

                                                                          gt preacher lt- setCor(1c(23)sobelstd=FALSE)

                                                                          gt setCordiagram(preacher)

                                                                          Regression Models

                                                                          THERAPY

                                                                          ATTRIB

                                                                          SATIS

                                                                          043

                                                                          04

                                                                          021

                                                                          Figure 17 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure

                                                                          43

                                                                          for speed The default number of boot straps is 5000

                                                                          53 Set Correlation

                                                                          An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor

                                                                          function Set correlation is

                                                                          R2 = 1minusn

                                                                          prodi=1

                                                                          (1minusλi)

                                                                          where λi is the ith eigen value of the eigen value decomposition of the matrix

                                                                          R = Rminus1xx RxyRminus1

                                                                          xx Rminus1xy

                                                                          Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate

                                                                          setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data

                                                                          Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things tonotice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied

                                                                          For this example the analysis is done on the correlation matrix rather than the rawdata

                                                                          gt C lt- cov(satactuse=pairwise)

                                                                          gt model1 lt- lm(ACT~ gender + education + age data=satact)

                                                                          gt summary(model1)

                                                                          Call

                                                                          lm(formula = ACT ~ gender + education + age data = satact)

                                                                          Residuals

                                                                          44

                                                                          Call mediate(y = c(SATQ) x = c(ACT) m = education data = satact

                                                                          mod = gender niter = 50 std = TRUE)

                                                                          The DV (Y) was SATQ The IV (X) was ACT gender ACTXgndr The mediating variable(s) = education

                                                                          Total Direct effect(c) of ACT on SATQ = 058 SE = 003 t direct = 1925 with probability = 0

                                                                          Direct effect (c) of ACT on SATQ removing education = 059 SE = 003 t direct = 1926 with probability = 0

                                                                          Indirect effect (ab) of ACT on SATQ through education = -001

                                                                          Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -002 Upper CI = 0

                                                                          Total Direct effect(c) of gender on SATQ = -014 SE = 003 t direct = -478 with probability = 21e-06

                                                                          Direct effect (c) of gender on NA removing education = -014 SE = 003 t direct = -463 with probability = 44e-06

                                                                          Indirect effect (ab) of gender on SATQ through education = 0

                                                                          Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -001 Upper CI = 0

                                                                          Total Direct effect(c) of ACTXgndr on SATQ = 0 SE = 003 t direct = 002 with probability = 099

                                                                          Direct effect (c) of ACTXgndr on NA removing education = 0 SE = 003 t direct = 001 with probability = 099

                                                                          Indirect effect (ab) of ACTXgndr on SATQ through education = 0

                                                                          Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = 0 Upper CI = 0

                                                                          R2 of model = 037

                                                                          To see the longer output specify short = FALSE in the print statement

                                                                          Full output

                                                                          Total effect estimates (c)

                                                                          SATQ se t Prob

                                                                          ACT 058 003 1925 000e+00

                                                                          gender -014 003 -478 210e-06

                                                                          ACTXgndr 000 003 002 985e-01

                                                                          Direct effect estimates (c)SATQ se t Prob

                                                                          ACT 059 003 1926 000e+00

                                                                          gender -014 003 -463 437e-06

                                                                          ACTXgndr 000 003 001 992e-01

                                                                          a effect estimates

                                                                          education se t Prob

                                                                          ACT 016 004 422 277e-05

                                                                          gender 009 004 250 128e-02

                                                                          ACTXgndr -001 004 -015 883e-01

                                                                          b effect estimates

                                                                          SATQ se t Prob

                                                                          education -004 003 -145 0147

                                                                          ab effect estimates

                                                                          SATQ boot sd lower upper

                                                                          ACT -001 -001 001 0 0

                                                                          gender 000 000 000 0 0

                                                                          ACTXgndr 000 000 000 0 0

                                                                          Moderation model

                                                                          ACT

                                                                          gender

                                                                          ACTXgndr

                                                                          SATQ

                                                                          education016 c = 058

                                                                          c = 059

                                                                          009 c = minus014

                                                                          c = minus014

                                                                          minus001 c = 0

                                                                          c = 0

                                                                          minus004

                                                                          minus004

                                                                          minus007

                                                                          002

                                                                          Figure 18 Moderated multiple regression requires the raw data

                                                                          45

                                                                          Min 1Q Median 3Q Max

                                                                          -252458 -32133 07769 35921 92630

                                                                          Coefficients

                                                                          Estimate Std Error t value Pr(gt|t|)

                                                                          (Intercept) 2741706 082140 33378 lt 2e-16

                                                                          gender -048606 037984 -1280 020110

                                                                          education 047890 015235 3143 000174

                                                                          age 001623 002278 0712 047650

                                                                          ---

                                                                          Signif codes 0 0001 001 005 01 1

                                                                          Residual standard error 4768 on 696 degrees of freedom

                                                                          Multiple R-squared 00272 Adjusted R-squared 002301

                                                                          F-statistic 6487 on 3 and 696 DF p-value 00002476

                                                                          Compare this with the output from setCor

                                                                          gt compare with sector

                                                                          gt setCor(c(46)c(13)C nobs=700)

                                                                          Call setCor(y = c(46) x = c(13) data = C nobs = 700)

                                                                          Multiple Regression from matrix input

                                                                          Beta weights

                                                                          ACT SATV SATQ

                                                                          gender -005 -003 -018

                                                                          education 014 010 010

                                                                          age 003 -010 -009

                                                                          Multiple R

                                                                          ACT SATV SATQ

                                                                          016 010 019

                                                                          multiple R2

                                                                          ACT SATV SATQ

                                                                          00272 00096 00359

                                                                          Multiple Inflation Factor (VIF) = 1(1-SMC) =

                                                                          gender education age

                                                                          101 145 144

                                                                          Unweighted multiple R

                                                                          ACT SATV SATQ

                                                                          015 005 011

                                                                          Unweighted multiple R2

                                                                          ACT SATV SATQ

                                                                          002 000 001

                                                                          SE of Beta weights

                                                                          ACT SATV SATQ

                                                                          gender 018 429 434

                                                                          education 022 513 518

                                                                          age 022 511 516

                                                                          t of Beta Weights

                                                                          ACT SATV SATQ

                                                                          gender -027 -001 -004

                                                                          education 065 002 002

                                                                          46

                                                                          age 015 -002 -002

                                                                          Probability of t lt

                                                                          ACT SATV SATQ

                                                                          gender 079 099 097

                                                                          education 051 098 098

                                                                          age 088 098 099

                                                                          Shrunken R2

                                                                          ACT SATV SATQ

                                                                          00230 00054 00317

                                                                          Standard Error of R2

                                                                          ACT SATV SATQ

                                                                          00120 00073 00137

                                                                          F

                                                                          ACT SATV SATQ

                                                                          649 226 863

                                                                          Probability of F lt

                                                                          ACT SATV SATQ

                                                                          248e-04 808e-02 124e-05

                                                                          degrees of freedom of regression

                                                                          [1] 3 696

                                                                          Various estimates of between set correlations

                                                                          Squared Canonical Correlations

                                                                          [1] 0050 0033 0008

                                                                          Chisq of canonical correlations

                                                                          [1] 358 231 56

                                                                          Average squared canonical correlation = 003

                                                                          Cohens Set Correlation R2 = 009

                                                                          Shrunken Set Correlation R2 = 008

                                                                          F and df of Cohens Set Correlation 726 9 168186

                                                                          Unweighted correlation between the two sets = 001

                                                                          Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

                                                                          6 Converting output to APA style tables using LATEX

                                                                          Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenient

                                                                          47

                                                                          LATEXoutput and finally df2latex converts a generic data frame to LATEX

                                                                          An example of converting the output from fa to LATEXappears in Table 2

                                                                          Table 2 fa2latexA factor analysis table from the psych package in R

                                                                          Variable MR1 MR2 MR3 h2 u2 com

                                                                          Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

                                                                          SS loadings 264 186 15

                                                                          MR1 100 059 054MR2 059 100 052MR3 054 052 100

                                                                          48

                                                                          7 Miscellaneous functions

                                                                          A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

                                                                          blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

                                                                          df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

                                                                          scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

                                                                          cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

                                                                          cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

                                                                          dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

                                                                          fisherz Convert a correlation to the corresponding Fisher z score

                                                                          geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

                                                                          ICC and cohenkappa are typically used to find the reliability for raters

                                                                          headtail combines the head and tail functions to show the first and last lines of a dataset or output

                                                                          topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

                                                                          mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

                                                                          prep finds the probability of replication for an F t or r and estimate effect size

                                                                          partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

                                                                          rangeCorrection will correct correlations for restriction of range

                                                                          reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

                                                                          49

                                                                          superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

                                                                          8 Data sets

                                                                          A number of data sets for demonstrating psychometric techniques are included in thepsych package These include six data sets showing a hierarchical factor structure (fivecognitive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2and one from health psychology Reise) One of these (Thurstone) is used as an examplein the sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) or 13 personality inventory scores (epibfi) and14 multiple choice iq items (iqitems) The vegetables example has paired comparisonpreferences for 9 vegetables This is an example of Thurstonian scaling used by Guilford(1954) and Nunnally (1967) Other data sets include cubits peas and heights fromGalton

                                                                          Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

                                                                          bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

                                                                          satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

                                                                          epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

                                                                          50

                                                                          iq 14 multiple choice ability items were included as part of the Synthetic Aperture Person-ality Assessment (SAPA) web based personality assessment project The data from1000 subjects are included here as a demonstration set for scoring multiple choiceinventories and doing basic item statistics

                                                                          galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to ntroduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

                                                                          Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

                                                                          miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

                                                                          9 Development version and a users guide

                                                                          The most recent development version is available as a source file at the repository main-tained at httppersonality-projectorgr That version will have removed the mostrecently discovered bugs (but perhaps introduced other yet to be discovered ones) Todownload that version go to the repository httppersonality-projectorgrsrc

                                                                          contrib and wander around For a Mac this version can be installed directly using theldquoother repositoryrdquo option in the package installer For a PC the zip file for the most recentrelease has been created using the win-builder facility at CRAN The development releasefor the Mac is usually several weeks ahead of the PC development version

                                                                          Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httppersonality-projectorgrpsych_manualpdf

                                                                          News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

                                                                          gt news(Version gt 170package=psych)

                                                                          51

                                                                          10 Psychometric Theory

                                                                          The psych package has been developed to help psychologists do basic research Many ofthe functions were developed to supplement a book (httppersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep)More information about the use of some of the functions may be found in the book

                                                                          For more extensive discussion of the use of psych in particular and R in general consulthttppersonality-projectorgrrguidehtml A short guide to R

                                                                          11 SessionInfo

                                                                          This document was prepared using the following settings

                                                                          gt sessionInfo()

                                                                          R Under development (unstable) (2017-03-05 r72309)

                                                                          Platform x86_64-apple-darwin1340 (64-bit)

                                                                          Running under macOS Sierra 10124

                                                                          Matrix products default

                                                                          BLAS LibraryFrameworksRframeworkVersions34ResourcesliblibRblas0dylib

                                                                          LAPACK LibraryFrameworksRframeworkVersions34ResourcesliblibRlapackdylib

                                                                          locale

                                                                          [1] C

                                                                          attached base packages

                                                                          [1] stats graphics grDevices utils datasets methods base

                                                                          other attached packages

                                                                          [1] psych_17421

                                                                          loaded via a namespace (and not attached)

                                                                          [1] compiler_340 parallel_340 tools_340 foreign_08-67

                                                                          [5] KernSmooth_223-15 nlme_31-131 mnormt_15-4 grid_340

                                                                          [9] lattice_020-34

                                                                          52

                                                                          References

                                                                          Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

                                                                          Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

                                                                          Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

                                                                          Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

                                                                          Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

                                                                          Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

                                                                          Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

                                                                          Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

                                                                          Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

                                                                          Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

                                                                          Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

                                                                          Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

                                                                          Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

                                                                          Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

                                                                          Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

                                                                          53

                                                                          Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

                                                                          Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

                                                                          Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

                                                                          Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

                                                                          Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

                                                                          Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

                                                                          Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

                                                                          Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

                                                                          Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

                                                                          Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

                                                                          MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

                                                                          Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

                                                                          McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

                                                                          Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

                                                                          Nunnally J C (1967) Psychometric theory McGraw-Hill New York

                                                                          54

                                                                          Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

                                                                          3rd edition

                                                                          Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

                                                                          Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

                                                                          Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

                                                                          Revelle W (2015) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 158

                                                                          Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

                                                                          Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

                                                                          Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

                                                                          Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

                                                                          Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook ofIndividual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

                                                                          Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

                                                                          Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

                                                                          Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

                                                                          Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bang

                                                                          55

                                                                          for the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

                                                                          Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

                                                                          Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

                                                                          Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

                                                                          Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

                                                                          Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

                                                                          Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

                                                                          Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

                                                                          Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

                                                                          Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

                                                                          Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

                                                                          Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

                                                                          56

                                                                          Index

                                                                          affect 14 24alpha 5 6

                                                                          Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26bifactor 6biserial 13 34blockrandom 49burt 34

                                                                          char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49circular statistics 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49ctv 7cubits 50

                                                                          densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49diagram 8drawcor 34drawtetra 34dummycode 13

                                                                          dynamite plot 19

                                                                          edit 3epibfi 50error bars 19errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23

                                                                          fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factor analysis 6factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49

                                                                          galton 51generalized least squares 6geometricmean 49GPArotation 7guttman 6

                                                                          harmonicmean 49head 49headtail 49heights 50hetdiagram 7Hmisc 28Holzinger 50

                                                                          57

                                                                          ICC 6 49iclust 6iclustdiagram 7Index 49introduction to psychometric theory with ap-

                                                                          plications in R 7iqitems 50irtfa 6 47irt2latex 47 49

                                                                          KnitR 47

                                                                          lavaan 38library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27lowess 14

                                                                          makekeys 14MAP 6mardia 49maximum likelihood 6mediate 4 41 42mediatediagram 41minimum residual 6mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevel 37multilevelreliability 6multiple regression 38

                                                                          nfactors 6nlme 37

                                                                          omega 6 7outlier 3 11 12

                                                                          padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7principal axis 6psych 3 5ndash8 28 47 49ndash52

                                                                          R functionaffect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50

                                                                          58

                                                                          densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13edit 3epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49head 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47

                                                                          irt2latex 47 49library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7psych 51psych package

                                                                          affect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26

                                                                          59

                                                                          biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49corsmooth 34cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7

                                                                          fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47irt2latex 47 49lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12prep 49pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34

                                                                          60

                                                                          polyserial 34principal 5ndash7psych 51rtest 28rangeCorrection 49readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

                                                                          rtest 28

                                                                          rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50table 19tail 49tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

                                                                          R package

                                                                          61

                                                                          ctv 7GPArotation 7Hmisc 28KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash8 28 47 49ndash52Rgraphviz 8sem 7 50stats 28Sweave 47xtable 47

                                                                          rtest 28rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49Rgraphviz 8

                                                                          SAPA 26 50 51satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43sem 7 50set correlation 44setcor 49setCor 4 38 41 44 46 47simmultilevel 37

                                                                          spider 13stars 14stats 28StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50Sweave 47

                                                                          table 19tail 49tetrachoric 6 34Thurstone 28 38 50Thurstone33 50topBottom 49

                                                                          vegetables 50 51violinBy 14 18vss 5 6

                                                                          weighted least squares 6withinBetween 37

                                                                          xtable 47

                                                                          62

                                                                          • Jump starting the psych packagendasha guide for the impatient
                                                                          • Psychometric functions are summarized in the second vignette
                                                                          • Overview of this and related documents
                                                                          • Getting started
                                                                          • Basic data analysis
                                                                            • Getting the data by using readfile
                                                                            • Data input from the clipboard
                                                                            • Basic descriptive statistics
                                                                              • Outlier detection using outlier
                                                                              • Basic data cleaning using scrub
                                                                              • Recoding categorical variables into dummy coded variables
                                                                                • Simple descriptive graphics
                                                                                  • Scatter Plot Matrices
                                                                                  • Density or violin plots
                                                                                  • Means and error bars
                                                                                  • Error bars for tabular data
                                                                                  • Two dimensional displays of means and errors
                                                                                  • Back to back histograms
                                                                                  • Correlational structure
                                                                                  • Heatmap displays of correlational structure
                                                                                    • Testing correlations
                                                                                    • Polychoric tetrachoric polyserial and biserial correlations
                                                                                      • Multilevel modeling
                                                                                        • Decomposing data into within and between level correlations using statsBy
                                                                                        • Generating and displaying multilevel data
                                                                                        • Factor analysis by groups
                                                                                          • Multiple Regression mediation moderation and set correlations
                                                                                            • Multiple regression from data or correlation matrices
                                                                                            • Mediation and Moderation analysis
                                                                                            • Set Correlation
                                                                                              • Converting output to APA style tables using LaTeX
                                                                                              • Miscellaneous functions
                                                                                              • Data sets
                                                                                              • Development version and a users guide
                                                                                              • Psychometric Theory
                                                                                              • SessionInfo

                                                                            Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )

                                                                            43 Factor analysis by groups

                                                                            Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output

                                                                            sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)

                                                                            faBy(sbnfactors=5) find the 5 factor solution for each education level

                                                                            5 Multiple Regression mediation moderation and set cor-relations

                                                                            The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices

                                                                            51 Multiple regression from data or correlation matrices

                                                                            The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstonecorrelation matrix and find the multiple correlation of the last five variables as a functionof the first 4

                                                                            gt setCor(y = 59x=14data=Thurstone)

                                                                            Call setCor(y = 59 x = 14 data = Thurstone)

                                                                            Multiple Regression from matrix input

                                                                            Beta weights

                                                                            FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

                                                                            Sentences 009 007 025 021 020

                                                                            Vocabulary 009 017 009 016 -002

                                                                            SentCompletion 002 005 004 021 008

                                                                            FirstLetters 058 045 021 008 031

                                                                            38

                                                                            Multiple R

                                                                            FourLetterWords Suffixes LetterSeries Pedigrees

                                                                            069 063 050 058

                                                                            LetterGroup

                                                                            048

                                                                            multiple R2

                                                                            FourLetterWords Suffixes LetterSeries Pedigrees

                                                                            048 040 025 034

                                                                            LetterGroup

                                                                            023

                                                                            Multiple Inflation Factor (VIF) = 1(1-SMC) =

                                                                            Sentences Vocabulary SentCompletion FirstLetters

                                                                            369 388 300 135

                                                                            Unweighted multiple R

                                                                            FourLetterWords Suffixes LetterSeries Pedigrees

                                                                            059 058 049 058

                                                                            LetterGroup

                                                                            045

                                                                            Unweighted multiple R2

                                                                            FourLetterWords Suffixes LetterSeries Pedigrees

                                                                            034 034 024 033

                                                                            LetterGroup

                                                                            020

                                                                            Various estimates of between set correlations

                                                                            Squared Canonical Correlations

                                                                            [1] 06280 01478 00076 00049

                                                                            Average squared canonical correlation = 02

                                                                            Cohens Set Correlation R2 = 069

                                                                            Unweighted correlation between the two sets = 073

                                                                            By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed

                                                                            gt sc lt- setCor(y = 59x=34data=Thurstonez=12)

                                                                            Call setCor(y = 59 x = 34 data = Thurstone z = 12)

                                                                            Multiple Regression from matrix input

                                                                            Beta weights

                                                                            FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

                                                                            SentCompletion 002 005 004 021 008

                                                                            FirstLetters 058 045 021 008 031

                                                                            Multiple R

                                                                            FourLetterWords Suffixes LetterSeries Pedigrees

                                                                            058 046 021 018

                                                                            LetterGroup

                                                                            030

                                                                            39

                                                                            multiple R2

                                                                            FourLetterWords Suffixes LetterSeries Pedigrees

                                                                            0331 0210 0043 0032

                                                                            LetterGroup

                                                                            0092

                                                                            Multiple Inflation Factor (VIF) = 1(1-SMC) =

                                                                            SentCompletion FirstLetters

                                                                            102 102

                                                                            Unweighted multiple R

                                                                            FourLetterWords Suffixes LetterSeries Pedigrees

                                                                            044 035 017 014

                                                                            LetterGroup

                                                                            026

                                                                            Unweighted multiple R2

                                                                            FourLetterWords Suffixes LetterSeries Pedigrees

                                                                            019 012 003 002

                                                                            LetterGroup

                                                                            007

                                                                            Various estimates of between set correlations

                                                                            Squared Canonical Correlations

                                                                            [1] 0405 0023

                                                                            Average squared canonical correlation = 021

                                                                            Cohens Set Correlation R2 = 042

                                                                            Unweighted correlation between the two sets = 048

                                                                            gt round(sc$residual2)

                                                                            FourLetterWords Suffixes LetterSeries Pedigrees

                                                                            FourLetterWords 052 011 009 006

                                                                            Suffixes 011 060 -001 001

                                                                            LetterSeries 009 -001 075 028

                                                                            Pedigrees 006 001 028 066

                                                                            LetterGroup 013 003 037 020

                                                                            LetterGroup

                                                                            FourLetterWords 013

                                                                            Suffixes 003

                                                                            LetterSeries 037

                                                                            Pedigrees 020

                                                                            LetterGroup 077

                                                                            52 Mediation and Moderation analysis

                                                                            Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping

                                                                            40

                                                                            Consider the example from Preacher and Hayes (2004) as analyzed using the mediate

                                                                            function and the subsequent graphic from mediatediagram The data are found in theexample for mediate

                                                                            Call mediate(y = SATIS x = THERAPY m = ATTRIB data = sobel)

                                                                            The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB

                                                                            Total Direct effect(c) of THERAPY on SATIS = 076 SE = 031 t direct = 25 with probability = 0019

                                                                            Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t direct = 135 with probability = 019

                                                                            Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033

                                                                            Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 004 Upper CI = 069

                                                                            R2 of model = 031

                                                                            To see the longer output specify short = FALSE in the print statement

                                                                            Full output

                                                                            Total effect estimates (c)

                                                                            SATIS se t Prob

                                                                            THERAPY 076 031 25 00186

                                                                            Direct effect estimates (c)SATIS se t Prob

                                                                            THERAPY 043 032 135 0190

                                                                            ATTRIB 040 018 223 0034

                                                                            a effect estimates

                                                                            THERAPY se t Prob

                                                                            ATTRIB 082 03 274 00106

                                                                            b effect estimates

                                                                            SATIS se t Prob

                                                                            ATTRIB 04 018 223 0034

                                                                            ab effect estimates

                                                                            SATIS boot sd lower upper

                                                                            THERAPY 033 032 017 004 069

                                                                            bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables

                                                                            setCor(y = c( SATV SATQ) x = c(education age ) data = satact std=TRUE)

                                                                            bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap

                                                                            mediate(y = c( SATV ) x = c(education age ) m= ACT data =satactstd=TRUEniter=50)

                                                                            bull mediate will take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It then tests the mediation effect using a bootstrap The particular example is for demonstration purposes only and shows neithermoderation nor mediation The number of iterations for the boot strap was set to 50

                                                                            41

                                                                            gt mediatediagram(preacher)

                                                                            Mediation model

                                                                            THERAPY SATIS

                                                                            ATTRIB

                                                                            082

                                                                            c = 076

                                                                            c = 043

                                                                            04

                                                                            Figure 16 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram

                                                                            42

                                                                            gt preacher lt- setCor(1c(23)sobelstd=FALSE)

                                                                            gt setCordiagram(preacher)

                                                                            Regression Models

                                                                            THERAPY

                                                                            ATTRIB

                                                                            SATIS

                                                                            043

                                                                            04

                                                                            021

                                                                            Figure 17 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure

                                                                            43

                                                                            for speed The default number of boot straps is 5000

                                                                            53 Set Correlation

                                                                            An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor

                                                                            function Set correlation is

                                                                            R2 = 1minusn

                                                                            prodi=1

                                                                            (1minusλi)

                                                                            where λi is the ith eigen value of the eigen value decomposition of the matrix

                                                                            R = Rminus1xx RxyRminus1

                                                                            xx Rminus1xy

                                                                            Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate

                                                                            setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data

                                                                            Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things tonotice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied

                                                                            For this example the analysis is done on the correlation matrix rather than the rawdata

                                                                            gt C lt- cov(satactuse=pairwise)

                                                                            gt model1 lt- lm(ACT~ gender + education + age data=satact)

                                                                            gt summary(model1)

                                                                            Call

                                                                            lm(formula = ACT ~ gender + education + age data = satact)

                                                                            Residuals

                                                                            44

                                                                            Call mediate(y = c(SATQ) x = c(ACT) m = education data = satact

                                                                            mod = gender niter = 50 std = TRUE)

                                                                            The DV (Y) was SATQ The IV (X) was ACT gender ACTXgndr The mediating variable(s) = education

                                                                            Total Direct effect(c) of ACT on SATQ = 058 SE = 003 t direct = 1925 with probability = 0

                                                                            Direct effect (c) of ACT on SATQ removing education = 059 SE = 003 t direct = 1926 with probability = 0

                                                                            Indirect effect (ab) of ACT on SATQ through education = -001

                                                                            Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -002 Upper CI = 0

                                                                            Total Direct effect(c) of gender on SATQ = -014 SE = 003 t direct = -478 with probability = 21e-06

                                                                            Direct effect (c) of gender on NA removing education = -014 SE = 003 t direct = -463 with probability = 44e-06

                                                                            Indirect effect (ab) of gender on SATQ through education = 0

                                                                            Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -001 Upper CI = 0

                                                                            Total Direct effect(c) of ACTXgndr on SATQ = 0 SE = 003 t direct = 002 with probability = 099

                                                                            Direct effect (c) of ACTXgndr on NA removing education = 0 SE = 003 t direct = 001 with probability = 099

                                                                            Indirect effect (ab) of ACTXgndr on SATQ through education = 0

                                                                            Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = 0 Upper CI = 0

                                                                            R2 of model = 037

                                                                            To see the longer output specify short = FALSE in the print statement

                                                                            Full output

                                                                            Total effect estimates (c)

                                                                            SATQ se t Prob

                                                                            ACT 058 003 1925 000e+00

                                                                            gender -014 003 -478 210e-06

                                                                            ACTXgndr 000 003 002 985e-01

                                                                            Direct effect estimates (c)SATQ se t Prob

                                                                            ACT 059 003 1926 000e+00

                                                                            gender -014 003 -463 437e-06

                                                                            ACTXgndr 000 003 001 992e-01

                                                                            a effect estimates

                                                                            education se t Prob

                                                                            ACT 016 004 422 277e-05

                                                                            gender 009 004 250 128e-02

                                                                            ACTXgndr -001 004 -015 883e-01

                                                                            b effect estimates

                                                                            SATQ se t Prob

                                                                            education -004 003 -145 0147

                                                                            ab effect estimates

                                                                            SATQ boot sd lower upper

                                                                            ACT -001 -001 001 0 0

                                                                            gender 000 000 000 0 0

                                                                            ACTXgndr 000 000 000 0 0

                                                                            Moderation model

                                                                            ACT

                                                                            gender

                                                                            ACTXgndr

                                                                            SATQ

                                                                            education016 c = 058

                                                                            c = 059

                                                                            009 c = minus014

                                                                            c = minus014

                                                                            minus001 c = 0

                                                                            c = 0

                                                                            minus004

                                                                            minus004

                                                                            minus007

                                                                            002

                                                                            Figure 18 Moderated multiple regression requires the raw data

                                                                            45

                                                                            Min 1Q Median 3Q Max

                                                                            -252458 -32133 07769 35921 92630

                                                                            Coefficients

                                                                            Estimate Std Error t value Pr(gt|t|)

                                                                            (Intercept) 2741706 082140 33378 lt 2e-16

                                                                            gender -048606 037984 -1280 020110

                                                                            education 047890 015235 3143 000174

                                                                            age 001623 002278 0712 047650

                                                                            ---

                                                                            Signif codes 0 0001 001 005 01 1

                                                                            Residual standard error 4768 on 696 degrees of freedom

                                                                            Multiple R-squared 00272 Adjusted R-squared 002301

                                                                            F-statistic 6487 on 3 and 696 DF p-value 00002476

                                                                            Compare this with the output from setCor

                                                                            gt compare with sector

                                                                            gt setCor(c(46)c(13)C nobs=700)

                                                                            Call setCor(y = c(46) x = c(13) data = C nobs = 700)

                                                                            Multiple Regression from matrix input

                                                                            Beta weights

                                                                            ACT SATV SATQ

                                                                            gender -005 -003 -018

                                                                            education 014 010 010

                                                                            age 003 -010 -009

                                                                            Multiple R

                                                                            ACT SATV SATQ

                                                                            016 010 019

                                                                            multiple R2

                                                                            ACT SATV SATQ

                                                                            00272 00096 00359

                                                                            Multiple Inflation Factor (VIF) = 1(1-SMC) =

                                                                            gender education age

                                                                            101 145 144

                                                                            Unweighted multiple R

                                                                            ACT SATV SATQ

                                                                            015 005 011

                                                                            Unweighted multiple R2

                                                                            ACT SATV SATQ

                                                                            002 000 001

                                                                            SE of Beta weights

                                                                            ACT SATV SATQ

                                                                            gender 018 429 434

                                                                            education 022 513 518

                                                                            age 022 511 516

                                                                            t of Beta Weights

                                                                            ACT SATV SATQ

                                                                            gender -027 -001 -004

                                                                            education 065 002 002

                                                                            46

                                                                            age 015 -002 -002

                                                                            Probability of t lt

                                                                            ACT SATV SATQ

                                                                            gender 079 099 097

                                                                            education 051 098 098

                                                                            age 088 098 099

                                                                            Shrunken R2

                                                                            ACT SATV SATQ

                                                                            00230 00054 00317

                                                                            Standard Error of R2

                                                                            ACT SATV SATQ

                                                                            00120 00073 00137

                                                                            F

                                                                            ACT SATV SATQ

                                                                            649 226 863

                                                                            Probability of F lt

                                                                            ACT SATV SATQ

                                                                            248e-04 808e-02 124e-05

                                                                            degrees of freedom of regression

                                                                            [1] 3 696

                                                                            Various estimates of between set correlations

                                                                            Squared Canonical Correlations

                                                                            [1] 0050 0033 0008

                                                                            Chisq of canonical correlations

                                                                            [1] 358 231 56

                                                                            Average squared canonical correlation = 003

                                                                            Cohens Set Correlation R2 = 009

                                                                            Shrunken Set Correlation R2 = 008

                                                                            F and df of Cohens Set Correlation 726 9 168186

                                                                            Unweighted correlation between the two sets = 001

                                                                            Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

                                                                            6 Converting output to APA style tables using LATEX

                                                                            Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenient

                                                                            47

                                                                            LATEXoutput and finally df2latex converts a generic data frame to LATEX

                                                                            An example of converting the output from fa to LATEXappears in Table 2

                                                                            Table 2 fa2latexA factor analysis table from the psych package in R

                                                                            Variable MR1 MR2 MR3 h2 u2 com

                                                                            Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

                                                                            SS loadings 264 186 15

                                                                            MR1 100 059 054MR2 059 100 052MR3 054 052 100

                                                                            48

                                                                            7 Miscellaneous functions

                                                                            A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

                                                                            blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

                                                                            df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

                                                                            scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

                                                                            cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

                                                                            cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

                                                                            dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

                                                                            fisherz Convert a correlation to the corresponding Fisher z score

                                                                            geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

                                                                            ICC and cohenkappa are typically used to find the reliability for raters

                                                                            headtail combines the head and tail functions to show the first and last lines of a dataset or output

                                                                            topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

                                                                            mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

                                                                            prep finds the probability of replication for an F t or r and estimate effect size

                                                                            partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

                                                                            rangeCorrection will correct correlations for restriction of range

                                                                            reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

                                                                            49

                                                                            superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

                                                                            8 Data sets

                                                                            A number of data sets for demonstrating psychometric techniques are included in thepsych package These include six data sets showing a hierarchical factor structure (fivecognitive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2and one from health psychology Reise) One of these (Thurstone) is used as an examplein the sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) or 13 personality inventory scores (epibfi) and14 multiple choice iq items (iqitems) The vegetables example has paired comparisonpreferences for 9 vegetables This is an example of Thurstonian scaling used by Guilford(1954) and Nunnally (1967) Other data sets include cubits peas and heights fromGalton

                                                                            Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

                                                                            bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

                                                                            satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

                                                                            epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

                                                                            50

                                                                            iq 14 multiple choice ability items were included as part of the Synthetic Aperture Person-ality Assessment (SAPA) web based personality assessment project The data from1000 subjects are included here as a demonstration set for scoring multiple choiceinventories and doing basic item statistics

                                                                            galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to ntroduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

                                                                            Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

                                                                            miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

                                                                            9 Development version and a users guide

                                                                            The most recent development version is available as a source file at the repository main-tained at httppersonality-projectorgr That version will have removed the mostrecently discovered bugs (but perhaps introduced other yet to be discovered ones) Todownload that version go to the repository httppersonality-projectorgrsrc

                                                                            contrib and wander around For a Mac this version can be installed directly using theldquoother repositoryrdquo option in the package installer For a PC the zip file for the most recentrelease has been created using the win-builder facility at CRAN The development releasefor the Mac is usually several weeks ahead of the PC development version

                                                                            Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httppersonality-projectorgrpsych_manualpdf

                                                                            News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

                                                                            gt news(Version gt 170package=psych)

                                                                            51

                                                                            10 Psychometric Theory

                                                                            The psych package has been developed to help psychologists do basic research Many ofthe functions were developed to supplement a book (httppersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep)More information about the use of some of the functions may be found in the book

                                                                            For more extensive discussion of the use of psych in particular and R in general consulthttppersonality-projectorgrrguidehtml A short guide to R

                                                                            11 SessionInfo

                                                                            This document was prepared using the following settings

                                                                            gt sessionInfo()

                                                                            R Under development (unstable) (2017-03-05 r72309)

                                                                            Platform x86_64-apple-darwin1340 (64-bit)

                                                                            Running under macOS Sierra 10124

                                                                            Matrix products default

                                                                            BLAS LibraryFrameworksRframeworkVersions34ResourcesliblibRblas0dylib

                                                                            LAPACK LibraryFrameworksRframeworkVersions34ResourcesliblibRlapackdylib

                                                                            locale

                                                                            [1] C

                                                                            attached base packages

                                                                            [1] stats graphics grDevices utils datasets methods base

                                                                            other attached packages

                                                                            [1] psych_17421

                                                                            loaded via a namespace (and not attached)

                                                                            [1] compiler_340 parallel_340 tools_340 foreign_08-67

                                                                            [5] KernSmooth_223-15 nlme_31-131 mnormt_15-4 grid_340

                                                                            [9] lattice_020-34

                                                                            52

                                                                            References

                                                                            Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

                                                                            Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

                                                                            Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

                                                                            Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

                                                                            Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

                                                                            Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

                                                                            Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

                                                                            Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

                                                                            Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

                                                                            Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

                                                                            Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

                                                                            Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

                                                                            Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

                                                                            Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

                                                                            Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

                                                                            53

                                                                            Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

                                                                            Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

                                                                            Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

                                                                            Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

                                                                            Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

                                                                            Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

                                                                            Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

                                                                            Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

                                                                            Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

                                                                            Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

                                                                            MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

                                                                            Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

                                                                            McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

                                                                            Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

                                                                            Nunnally J C (1967) Psychometric theory McGraw-Hill New York

                                                                            54

                                                                            Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

                                                                            3rd edition

                                                                            Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

                                                                            Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

                                                                            Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

                                                                            Revelle W (2015) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 158

                                                                            Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

                                                                            Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

                                                                            Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

                                                                            Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

                                                                            Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook ofIndividual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

                                                                            Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

                                                                            Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

                                                                            Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

                                                                            Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bang

                                                                            55

                                                                            for the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

                                                                            Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

                                                                            Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

                                                                            Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

                                                                            Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

                                                                            Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

                                                                            Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

                                                                            Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

                                                                            Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

                                                                            Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

                                                                            Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

                                                                            Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

                                                                            56

                                                                            Index

                                                                            affect 14 24alpha 5 6

                                                                            Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26bifactor 6biserial 13 34blockrandom 49burt 34

                                                                            char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49circular statistics 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49ctv 7cubits 50

                                                                            densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49diagram 8drawcor 34drawtetra 34dummycode 13

                                                                            dynamite plot 19

                                                                            edit 3epibfi 50error bars 19errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23

                                                                            fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factor analysis 6factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49

                                                                            galton 51generalized least squares 6geometricmean 49GPArotation 7guttman 6

                                                                            harmonicmean 49head 49headtail 49heights 50hetdiagram 7Hmisc 28Holzinger 50

                                                                            57

                                                                            ICC 6 49iclust 6iclustdiagram 7Index 49introduction to psychometric theory with ap-

                                                                            plications in R 7iqitems 50irtfa 6 47irt2latex 47 49

                                                                            KnitR 47

                                                                            lavaan 38library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27lowess 14

                                                                            makekeys 14MAP 6mardia 49maximum likelihood 6mediate 4 41 42mediatediagram 41minimum residual 6mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevel 37multilevelreliability 6multiple regression 38

                                                                            nfactors 6nlme 37

                                                                            omega 6 7outlier 3 11 12

                                                                            padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7principal axis 6psych 3 5ndash8 28 47 49ndash52

                                                                            R functionaffect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50

                                                                            58

                                                                            densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13edit 3epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49head 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47

                                                                            irt2latex 47 49library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7psych 51psych package

                                                                            affect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26

                                                                            59

                                                                            biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49corsmooth 34cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7

                                                                            fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47irt2latex 47 49lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12prep 49pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34

                                                                            60

                                                                            polyserial 34principal 5ndash7psych 51rtest 28rangeCorrection 49readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

                                                                            rtest 28

                                                                            rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50table 19tail 49tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

                                                                            R package

                                                                            61

                                                                            ctv 7GPArotation 7Hmisc 28KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash8 28 47 49ndash52Rgraphviz 8sem 7 50stats 28Sweave 47xtable 47

                                                                            rtest 28rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49Rgraphviz 8

                                                                            SAPA 26 50 51satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43sem 7 50set correlation 44setcor 49setCor 4 38 41 44 46 47simmultilevel 37

                                                                            spider 13stars 14stats 28StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50Sweave 47

                                                                            table 19tail 49tetrachoric 6 34Thurstone 28 38 50Thurstone33 50topBottom 49

                                                                            vegetables 50 51violinBy 14 18vss 5 6

                                                                            weighted least squares 6withinBetween 37

                                                                            xtable 47

                                                                            62

                                                                            • Jump starting the psych packagendasha guide for the impatient
                                                                            • Psychometric functions are summarized in the second vignette
                                                                            • Overview of this and related documents
                                                                            • Getting started
                                                                            • Basic data analysis
                                                                              • Getting the data by using readfile
                                                                              • Data input from the clipboard
                                                                              • Basic descriptive statistics
                                                                                • Outlier detection using outlier
                                                                                • Basic data cleaning using scrub
                                                                                • Recoding categorical variables into dummy coded variables
                                                                                  • Simple descriptive graphics
                                                                                    • Scatter Plot Matrices
                                                                                    • Density or violin plots
                                                                                    • Means and error bars
                                                                                    • Error bars for tabular data
                                                                                    • Two dimensional displays of means and errors
                                                                                    • Back to back histograms
                                                                                    • Correlational structure
                                                                                    • Heatmap displays of correlational structure
                                                                                      • Testing correlations
                                                                                      • Polychoric tetrachoric polyserial and biserial correlations
                                                                                        • Multilevel modeling
                                                                                          • Decomposing data into within and between level correlations using statsBy
                                                                                          • Generating and displaying multilevel data
                                                                                          • Factor analysis by groups
                                                                                            • Multiple Regression mediation moderation and set correlations
                                                                                              • Multiple regression from data or correlation matrices
                                                                                              • Mediation and Moderation analysis
                                                                                              • Set Correlation
                                                                                                • Converting output to APA style tables using LaTeX
                                                                                                • Miscellaneous functions
                                                                                                • Data sets
                                                                                                • Development version and a users guide
                                                                                                • Psychometric Theory
                                                                                                • SessionInfo

                                                                              Multiple R

                                                                              FourLetterWords Suffixes LetterSeries Pedigrees

                                                                              069 063 050 058

                                                                              LetterGroup

                                                                              048

                                                                              multiple R2

                                                                              FourLetterWords Suffixes LetterSeries Pedigrees

                                                                              048 040 025 034

                                                                              LetterGroup

                                                                              023

                                                                              Multiple Inflation Factor (VIF) = 1(1-SMC) =

                                                                              Sentences Vocabulary SentCompletion FirstLetters

                                                                              369 388 300 135

                                                                              Unweighted multiple R

                                                                              FourLetterWords Suffixes LetterSeries Pedigrees

                                                                              059 058 049 058

                                                                              LetterGroup

                                                                              045

                                                                              Unweighted multiple R2

                                                                              FourLetterWords Suffixes LetterSeries Pedigrees

                                                                              034 034 024 033

                                                                              LetterGroup

                                                                              020

                                                                              Various estimates of between set correlations

                                                                              Squared Canonical Correlations

                                                                              [1] 06280 01478 00076 00049

                                                                              Average squared canonical correlation = 02

                                                                              Cohens Set Correlation R2 = 069

                                                                              Unweighted correlation between the two sets = 073

                                                                              By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed

                                                                              gt sc lt- setCor(y = 59x=34data=Thurstonez=12)

                                                                              Call setCor(y = 59 x = 34 data = Thurstone z = 12)

                                                                              Multiple Regression from matrix input

                                                                              Beta weights

                                                                              FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

                                                                              SentCompletion 002 005 004 021 008

                                                                              FirstLetters 058 045 021 008 031

                                                                              Multiple R

                                                                              FourLetterWords Suffixes LetterSeries Pedigrees

                                                                              058 046 021 018

                                                                              LetterGroup

                                                                              030

                                                                              39

                                                                              multiple R2

                                                                              FourLetterWords Suffixes LetterSeries Pedigrees

                                                                              0331 0210 0043 0032

                                                                              LetterGroup

                                                                              0092

                                                                              Multiple Inflation Factor (VIF) = 1(1-SMC) =

                                                                              SentCompletion FirstLetters

                                                                              102 102

                                                                              Unweighted multiple R

                                                                              FourLetterWords Suffixes LetterSeries Pedigrees

                                                                              044 035 017 014

                                                                              LetterGroup

                                                                              026

                                                                              Unweighted multiple R2

                                                                              FourLetterWords Suffixes LetterSeries Pedigrees

                                                                              019 012 003 002

                                                                              LetterGroup

                                                                              007

                                                                              Various estimates of between set correlations

                                                                              Squared Canonical Correlations

                                                                              [1] 0405 0023

                                                                              Average squared canonical correlation = 021

                                                                              Cohens Set Correlation R2 = 042

                                                                              Unweighted correlation between the two sets = 048

                                                                              gt round(sc$residual2)

                                                                              FourLetterWords Suffixes LetterSeries Pedigrees

                                                                              FourLetterWords 052 011 009 006

                                                                              Suffixes 011 060 -001 001

                                                                              LetterSeries 009 -001 075 028

                                                                              Pedigrees 006 001 028 066

                                                                              LetterGroup 013 003 037 020

                                                                              LetterGroup

                                                                              FourLetterWords 013

                                                                              Suffixes 003

                                                                              LetterSeries 037

                                                                              Pedigrees 020

                                                                              LetterGroup 077

                                                                              52 Mediation and Moderation analysis

                                                                              Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping

                                                                              40

                                                                              Consider the example from Preacher and Hayes (2004) as analyzed using the mediate

                                                                              function and the subsequent graphic from mediatediagram The data are found in theexample for mediate

                                                                              Call mediate(y = SATIS x = THERAPY m = ATTRIB data = sobel)

                                                                              The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB

                                                                              Total Direct effect(c) of THERAPY on SATIS = 076 SE = 031 t direct = 25 with probability = 0019

                                                                              Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t direct = 135 with probability = 019

                                                                              Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033

                                                                              Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 004 Upper CI = 069

                                                                              R2 of model = 031

                                                                              To see the longer output specify short = FALSE in the print statement

                                                                              Full output

                                                                              Total effect estimates (c)

                                                                              SATIS se t Prob

                                                                              THERAPY 076 031 25 00186

                                                                              Direct effect estimates (c)SATIS se t Prob

                                                                              THERAPY 043 032 135 0190

                                                                              ATTRIB 040 018 223 0034

                                                                              a effect estimates

                                                                              THERAPY se t Prob

                                                                              ATTRIB 082 03 274 00106

                                                                              b effect estimates

                                                                              SATIS se t Prob

                                                                              ATTRIB 04 018 223 0034

                                                                              ab effect estimates

                                                                              SATIS boot sd lower upper

                                                                              THERAPY 033 032 017 004 069

                                                                              bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables

                                                                              setCor(y = c( SATV SATQ) x = c(education age ) data = satact std=TRUE)

                                                                              bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap

                                                                              mediate(y = c( SATV ) x = c(education age ) m= ACT data =satactstd=TRUEniter=50)

                                                                              bull mediate will take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It then tests the mediation effect using a bootstrap The particular example is for demonstration purposes only and shows neithermoderation nor mediation The number of iterations for the boot strap was set to 50

                                                                              41

                                                                              gt mediatediagram(preacher)

                                                                              Mediation model

                                                                              THERAPY SATIS

                                                                              ATTRIB

                                                                              082

                                                                              c = 076

                                                                              c = 043

                                                                              04

                                                                              Figure 16 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram

                                                                              42

                                                                              gt preacher lt- setCor(1c(23)sobelstd=FALSE)

                                                                              gt setCordiagram(preacher)

                                                                              Regression Models

                                                                              THERAPY

                                                                              ATTRIB

                                                                              SATIS

                                                                              043

                                                                              04

                                                                              021

                                                                              Figure 17 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure

                                                                              43

                                                                              for speed The default number of boot straps is 5000

                                                                              53 Set Correlation

                                                                              An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor

                                                                              function Set correlation is

                                                                              R2 = 1minusn

                                                                              prodi=1

                                                                              (1minusλi)

                                                                              where λi is the ith eigen value of the eigen value decomposition of the matrix

                                                                              R = Rminus1xx RxyRminus1

                                                                              xx Rminus1xy

                                                                              Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate

                                                                              setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data

                                                                              Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things tonotice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied

                                                                              For this example the analysis is done on the correlation matrix rather than the rawdata

                                                                              gt C lt- cov(satactuse=pairwise)

                                                                              gt model1 lt- lm(ACT~ gender + education + age data=satact)

                                                                              gt summary(model1)

                                                                              Call

                                                                              lm(formula = ACT ~ gender + education + age data = satact)

                                                                              Residuals

                                                                              44

                                                                              Call mediate(y = c(SATQ) x = c(ACT) m = education data = satact

                                                                              mod = gender niter = 50 std = TRUE)

                                                                              The DV (Y) was SATQ The IV (X) was ACT gender ACTXgndr The mediating variable(s) = education

                                                                              Total Direct effect(c) of ACT on SATQ = 058 SE = 003 t direct = 1925 with probability = 0

                                                                              Direct effect (c) of ACT on SATQ removing education = 059 SE = 003 t direct = 1926 with probability = 0

                                                                              Indirect effect (ab) of ACT on SATQ through education = -001

                                                                              Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -002 Upper CI = 0

                                                                              Total Direct effect(c) of gender on SATQ = -014 SE = 003 t direct = -478 with probability = 21e-06

                                                                              Direct effect (c) of gender on NA removing education = -014 SE = 003 t direct = -463 with probability = 44e-06

                                                                              Indirect effect (ab) of gender on SATQ through education = 0

                                                                              Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -001 Upper CI = 0

                                                                              Total Direct effect(c) of ACTXgndr on SATQ = 0 SE = 003 t direct = 002 with probability = 099

                                                                              Direct effect (c) of ACTXgndr on NA removing education = 0 SE = 003 t direct = 001 with probability = 099

                                                                              Indirect effect (ab) of ACTXgndr on SATQ through education = 0

                                                                              Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = 0 Upper CI = 0

                                                                              R2 of model = 037

                                                                              To see the longer output specify short = FALSE in the print statement

                                                                              Full output

                                                                              Total effect estimates (c)

                                                                              SATQ se t Prob

                                                                              ACT 058 003 1925 000e+00

                                                                              gender -014 003 -478 210e-06

                                                                              ACTXgndr 000 003 002 985e-01

                                                                              Direct effect estimates (c)SATQ se t Prob

                                                                              ACT 059 003 1926 000e+00

                                                                              gender -014 003 -463 437e-06

                                                                              ACTXgndr 000 003 001 992e-01

                                                                              a effect estimates

                                                                              education se t Prob

                                                                              ACT 016 004 422 277e-05

                                                                              gender 009 004 250 128e-02

                                                                              ACTXgndr -001 004 -015 883e-01

                                                                              b effect estimates

                                                                              SATQ se t Prob

                                                                              education -004 003 -145 0147

                                                                              ab effect estimates

                                                                              SATQ boot sd lower upper

                                                                              ACT -001 -001 001 0 0

                                                                              gender 000 000 000 0 0

                                                                              ACTXgndr 000 000 000 0 0

                                                                              Moderation model

                                                                              ACT

                                                                              gender

                                                                              ACTXgndr

                                                                              SATQ

                                                                              education016 c = 058

                                                                              c = 059

                                                                              009 c = minus014

                                                                              c = minus014

                                                                              minus001 c = 0

                                                                              c = 0

                                                                              minus004

                                                                              minus004

                                                                              minus007

                                                                              002

                                                                              Figure 18 Moderated multiple regression requires the raw data

                                                                              45

                                                                              Min 1Q Median 3Q Max

                                                                              -252458 -32133 07769 35921 92630

                                                                              Coefficients

                                                                              Estimate Std Error t value Pr(gt|t|)

                                                                              (Intercept) 2741706 082140 33378 lt 2e-16

                                                                              gender -048606 037984 -1280 020110

                                                                              education 047890 015235 3143 000174

                                                                              age 001623 002278 0712 047650

                                                                              ---

                                                                              Signif codes 0 0001 001 005 01 1

                                                                              Residual standard error 4768 on 696 degrees of freedom

                                                                              Multiple R-squared 00272 Adjusted R-squared 002301

                                                                              F-statistic 6487 on 3 and 696 DF p-value 00002476

                                                                              Compare this with the output from setCor

                                                                              gt compare with sector

                                                                              gt setCor(c(46)c(13)C nobs=700)

                                                                              Call setCor(y = c(46) x = c(13) data = C nobs = 700)

                                                                              Multiple Regression from matrix input

                                                                              Beta weights

                                                                              ACT SATV SATQ

                                                                              gender -005 -003 -018

                                                                              education 014 010 010

                                                                              age 003 -010 -009

                                                                              Multiple R

                                                                              ACT SATV SATQ

                                                                              016 010 019

                                                                              multiple R2

                                                                              ACT SATV SATQ

                                                                              00272 00096 00359

                                                                              Multiple Inflation Factor (VIF) = 1(1-SMC) =

                                                                              gender education age

                                                                              101 145 144

                                                                              Unweighted multiple R

                                                                              ACT SATV SATQ

                                                                              015 005 011

                                                                              Unweighted multiple R2

                                                                              ACT SATV SATQ

                                                                              002 000 001

                                                                              SE of Beta weights

                                                                              ACT SATV SATQ

                                                                              gender 018 429 434

                                                                              education 022 513 518

                                                                              age 022 511 516

                                                                              t of Beta Weights

                                                                              ACT SATV SATQ

                                                                              gender -027 -001 -004

                                                                              education 065 002 002

                                                                              46

                                                                              age 015 -002 -002

                                                                              Probability of t lt

                                                                              ACT SATV SATQ

                                                                              gender 079 099 097

                                                                              education 051 098 098

                                                                              age 088 098 099

                                                                              Shrunken R2

                                                                              ACT SATV SATQ

                                                                              00230 00054 00317

                                                                              Standard Error of R2

                                                                              ACT SATV SATQ

                                                                              00120 00073 00137

                                                                              F

                                                                              ACT SATV SATQ

                                                                              649 226 863

                                                                              Probability of F lt

                                                                              ACT SATV SATQ

                                                                              248e-04 808e-02 124e-05

                                                                              degrees of freedom of regression

                                                                              [1] 3 696

                                                                              Various estimates of between set correlations

                                                                              Squared Canonical Correlations

                                                                              [1] 0050 0033 0008

                                                                              Chisq of canonical correlations

                                                                              [1] 358 231 56

                                                                              Average squared canonical correlation = 003

                                                                              Cohens Set Correlation R2 = 009

                                                                              Shrunken Set Correlation R2 = 008

                                                                              F and df of Cohens Set Correlation 726 9 168186

                                                                              Unweighted correlation between the two sets = 001

                                                                              Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

                                                                              6 Converting output to APA style tables using LATEX

                                                                              Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenient

                                                                              47

                                                                              LATEXoutput and finally df2latex converts a generic data frame to LATEX

                                                                              An example of converting the output from fa to LATEXappears in Table 2

                                                                              Table 2 fa2latexA factor analysis table from the psych package in R

                                                                              Variable MR1 MR2 MR3 h2 u2 com

                                                                              Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

                                                                              SS loadings 264 186 15

                                                                              MR1 100 059 054MR2 059 100 052MR3 054 052 100

                                                                              48

                                                                              7 Miscellaneous functions

                                                                              A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

                                                                              blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

                                                                              df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

                                                                              scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

                                                                              cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

                                                                              cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

                                                                              dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

                                                                              fisherz Convert a correlation to the corresponding Fisher z score

                                                                              geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

                                                                              ICC and cohenkappa are typically used to find the reliability for raters

                                                                              headtail combines the head and tail functions to show the first and last lines of a dataset or output

                                                                              topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

                                                                              mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

                                                                              prep finds the probability of replication for an F t or r and estimate effect size

                                                                              partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

                                                                              rangeCorrection will correct correlations for restriction of range

                                                                              reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

                                                                              49

                                                                              superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

                                                                              8 Data sets

                                                                              A number of data sets for demonstrating psychometric techniques are included in thepsych package These include six data sets showing a hierarchical factor structure (fivecognitive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2and one from health psychology Reise) One of these (Thurstone) is used as an examplein the sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) or 13 personality inventory scores (epibfi) and14 multiple choice iq items (iqitems) The vegetables example has paired comparisonpreferences for 9 vegetables This is an example of Thurstonian scaling used by Guilford(1954) and Nunnally (1967) Other data sets include cubits peas and heights fromGalton

                                                                              Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

                                                                              bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

                                                                              satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

                                                                              epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

                                                                              50

                                                                              iq 14 multiple choice ability items were included as part of the Synthetic Aperture Person-ality Assessment (SAPA) web based personality assessment project The data from1000 subjects are included here as a demonstration set for scoring multiple choiceinventories and doing basic item statistics

                                                                              galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to ntroduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

                                                                              Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

                                                                              miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

                                                                              9 Development version and a users guide

                                                                              The most recent development version is available as a source file at the repository main-tained at httppersonality-projectorgr That version will have removed the mostrecently discovered bugs (but perhaps introduced other yet to be discovered ones) Todownload that version go to the repository httppersonality-projectorgrsrc

                                                                              contrib and wander around For a Mac this version can be installed directly using theldquoother repositoryrdquo option in the package installer For a PC the zip file for the most recentrelease has been created using the win-builder facility at CRAN The development releasefor the Mac is usually several weeks ahead of the PC development version

                                                                              Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httppersonality-projectorgrpsych_manualpdf

                                                                              News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

                                                                              gt news(Version gt 170package=psych)

                                                                              51

                                                                              10 Psychometric Theory

                                                                              The psych package has been developed to help psychologists do basic research Many ofthe functions were developed to supplement a book (httppersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep)More information about the use of some of the functions may be found in the book

                                                                              For more extensive discussion of the use of psych in particular and R in general consulthttppersonality-projectorgrrguidehtml A short guide to R

                                                                              11 SessionInfo

                                                                              This document was prepared using the following settings

                                                                              gt sessionInfo()

                                                                              R Under development (unstable) (2017-03-05 r72309)

                                                                              Platform x86_64-apple-darwin1340 (64-bit)

                                                                              Running under macOS Sierra 10124

                                                                              Matrix products default

                                                                              BLAS LibraryFrameworksRframeworkVersions34ResourcesliblibRblas0dylib

                                                                              LAPACK LibraryFrameworksRframeworkVersions34ResourcesliblibRlapackdylib

                                                                              locale

                                                                              [1] C

                                                                              attached base packages

                                                                              [1] stats graphics grDevices utils datasets methods base

                                                                              other attached packages

                                                                              [1] psych_17421

                                                                              loaded via a namespace (and not attached)

                                                                              [1] compiler_340 parallel_340 tools_340 foreign_08-67

                                                                              [5] KernSmooth_223-15 nlme_31-131 mnormt_15-4 grid_340

                                                                              [9] lattice_020-34

                                                                              52

                                                                              References

                                                                              Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

                                                                              Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

                                                                              Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

                                                                              Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

                                                                              Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

                                                                              Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

                                                                              Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

                                                                              Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

                                                                              Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

                                                                              Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

                                                                              Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

                                                                              Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

                                                                              Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

                                                                              Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

                                                                              Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

                                                                              53

                                                                              Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

                                                                              Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

                                                                              Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

                                                                              Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

                                                                              Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

                                                                              Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

                                                                              Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

                                                                              Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

                                                                              Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

                                                                              Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

                                                                              MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

                                                                              Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

                                                                              McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

                                                                              Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

                                                                              Nunnally J C (1967) Psychometric theory McGraw-Hill New York

                                                                              54

                                                                              Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

                                                                              3rd edition

                                                                              Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

                                                                              Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

                                                                              Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

                                                                              Revelle W (2015) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 158

                                                                              Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

                                                                              Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

                                                                              Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

                                                                              Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

                                                                              Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook ofIndividual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

                                                                              Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

                                                                              Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

                                                                              Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

                                                                              Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bang

                                                                              55

                                                                              for the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

                                                                              Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

                                                                              Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

                                                                              Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

                                                                              Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

                                                                              Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

                                                                              Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

                                                                              Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

                                                                              Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

                                                                              Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

                                                                              Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

                                                                              Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

                                                                              56

                                                                              Index

                                                                              affect 14 24alpha 5 6

                                                                              Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26bifactor 6biserial 13 34blockrandom 49burt 34

                                                                              char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49circular statistics 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49ctv 7cubits 50

                                                                              densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49diagram 8drawcor 34drawtetra 34dummycode 13

                                                                              dynamite plot 19

                                                                              edit 3epibfi 50error bars 19errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23

                                                                              fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factor analysis 6factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49

                                                                              galton 51generalized least squares 6geometricmean 49GPArotation 7guttman 6

                                                                              harmonicmean 49head 49headtail 49heights 50hetdiagram 7Hmisc 28Holzinger 50

                                                                              57

                                                                              ICC 6 49iclust 6iclustdiagram 7Index 49introduction to psychometric theory with ap-

                                                                              plications in R 7iqitems 50irtfa 6 47irt2latex 47 49

                                                                              KnitR 47

                                                                              lavaan 38library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27lowess 14

                                                                              makekeys 14MAP 6mardia 49maximum likelihood 6mediate 4 41 42mediatediagram 41minimum residual 6mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevel 37multilevelreliability 6multiple regression 38

                                                                              nfactors 6nlme 37

                                                                              omega 6 7outlier 3 11 12

                                                                              padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7principal axis 6psych 3 5ndash8 28 47 49ndash52

                                                                              R functionaffect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50

                                                                              58

                                                                              densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13edit 3epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49head 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47

                                                                              irt2latex 47 49library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7psych 51psych package

                                                                              affect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26

                                                                              59

                                                                              biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49corsmooth 34cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7

                                                                              fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47irt2latex 47 49lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12prep 49pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34

                                                                              60

                                                                              polyserial 34principal 5ndash7psych 51rtest 28rangeCorrection 49readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

                                                                              rtest 28

                                                                              rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50table 19tail 49tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

                                                                              R package

                                                                              61

                                                                              ctv 7GPArotation 7Hmisc 28KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash8 28 47 49ndash52Rgraphviz 8sem 7 50stats 28Sweave 47xtable 47

                                                                              rtest 28rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49Rgraphviz 8

                                                                              SAPA 26 50 51satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43sem 7 50set correlation 44setcor 49setCor 4 38 41 44 46 47simmultilevel 37

                                                                              spider 13stars 14stats 28StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50Sweave 47

                                                                              table 19tail 49tetrachoric 6 34Thurstone 28 38 50Thurstone33 50topBottom 49

                                                                              vegetables 50 51violinBy 14 18vss 5 6

                                                                              weighted least squares 6withinBetween 37

                                                                              xtable 47

                                                                              62

                                                                              • Jump starting the psych packagendasha guide for the impatient
                                                                              • Psychometric functions are summarized in the second vignette
                                                                              • Overview of this and related documents
                                                                              • Getting started
                                                                              • Basic data analysis
                                                                                • Getting the data by using readfile
                                                                                • Data input from the clipboard
                                                                                • Basic descriptive statistics
                                                                                  • Outlier detection using outlier
                                                                                  • Basic data cleaning using scrub
                                                                                  • Recoding categorical variables into dummy coded variables
                                                                                    • Simple descriptive graphics
                                                                                      • Scatter Plot Matrices
                                                                                      • Density or violin plots
                                                                                      • Means and error bars
                                                                                      • Error bars for tabular data
                                                                                      • Two dimensional displays of means and errors
                                                                                      • Back to back histograms
                                                                                      • Correlational structure
                                                                                      • Heatmap displays of correlational structure
                                                                                        • Testing correlations
                                                                                        • Polychoric tetrachoric polyserial and biserial correlations
                                                                                          • Multilevel modeling
                                                                                            • Decomposing data into within and between level correlations using statsBy
                                                                                            • Generating and displaying multilevel data
                                                                                            • Factor analysis by groups
                                                                                              • Multiple Regression mediation moderation and set correlations
                                                                                                • Multiple regression from data or correlation matrices
                                                                                                • Mediation and Moderation analysis
                                                                                                • Set Correlation
                                                                                                  • Converting output to APA style tables using LaTeX
                                                                                                  • Miscellaneous functions
                                                                                                  • Data sets
                                                                                                  • Development version and a users guide
                                                                                                  • Psychometric Theory
                                                                                                  • SessionInfo

                                                                                multiple R2

                                                                                FourLetterWords Suffixes LetterSeries Pedigrees

                                                                                0331 0210 0043 0032

                                                                                LetterGroup

                                                                                0092

                                                                                Multiple Inflation Factor (VIF) = 1(1-SMC) =

                                                                                SentCompletion FirstLetters

                                                                                102 102

                                                                                Unweighted multiple R

                                                                                FourLetterWords Suffixes LetterSeries Pedigrees

                                                                                044 035 017 014

                                                                                LetterGroup

                                                                                026

                                                                                Unweighted multiple R2

                                                                                FourLetterWords Suffixes LetterSeries Pedigrees

                                                                                019 012 003 002

                                                                                LetterGroup

                                                                                007

                                                                                Various estimates of between set correlations

                                                                                Squared Canonical Correlations

                                                                                [1] 0405 0023

                                                                                Average squared canonical correlation = 021

                                                                                Cohens Set Correlation R2 = 042

                                                                                Unweighted correlation between the two sets = 048

                                                                                gt round(sc$residual2)

                                                                                FourLetterWords Suffixes LetterSeries Pedigrees

                                                                                FourLetterWords 052 011 009 006

                                                                                Suffixes 011 060 -001 001

                                                                                LetterSeries 009 -001 075 028

                                                                                Pedigrees 006 001 028 066

                                                                                LetterGroup 013 003 037 020

                                                                                LetterGroup

                                                                                FourLetterWords 013

                                                                                Suffixes 003

                                                                                LetterSeries 037

                                                                                Pedigrees 020

                                                                                LetterGroup 077

                                                                                52 Mediation and Moderation analysis

                                                                                Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping

                                                                                40

                                                                                Consider the example from Preacher and Hayes (2004) as analyzed using the mediate

                                                                                function and the subsequent graphic from mediatediagram The data are found in theexample for mediate

                                                                                Call mediate(y = SATIS x = THERAPY m = ATTRIB data = sobel)

                                                                                The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB

                                                                                Total Direct effect(c) of THERAPY on SATIS = 076 SE = 031 t direct = 25 with probability = 0019

                                                                                Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t direct = 135 with probability = 019

                                                                                Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033

                                                                                Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 004 Upper CI = 069

                                                                                R2 of model = 031

                                                                                To see the longer output specify short = FALSE in the print statement

                                                                                Full output

                                                                                Total effect estimates (c)

                                                                                SATIS se t Prob

                                                                                THERAPY 076 031 25 00186

                                                                                Direct effect estimates (c)SATIS se t Prob

                                                                                THERAPY 043 032 135 0190

                                                                                ATTRIB 040 018 223 0034

                                                                                a effect estimates

                                                                                THERAPY se t Prob

                                                                                ATTRIB 082 03 274 00106

                                                                                b effect estimates

                                                                                SATIS se t Prob

                                                                                ATTRIB 04 018 223 0034

                                                                                ab effect estimates

                                                                                SATIS boot sd lower upper

                                                                                THERAPY 033 032 017 004 069

                                                                                bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables

                                                                                setCor(y = c( SATV SATQ) x = c(education age ) data = satact std=TRUE)

                                                                                bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap

                                                                                mediate(y = c( SATV ) x = c(education age ) m= ACT data =satactstd=TRUEniter=50)

                                                                                bull mediate will take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It then tests the mediation effect using a bootstrap The particular example is for demonstration purposes only and shows neithermoderation nor mediation The number of iterations for the boot strap was set to 50

                                                                                41

                                                                                gt mediatediagram(preacher)

                                                                                Mediation model

                                                                                THERAPY SATIS

                                                                                ATTRIB

                                                                                082

                                                                                c = 076

                                                                                c = 043

                                                                                04

                                                                                Figure 16 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram

                                                                                42

                                                                                gt preacher lt- setCor(1c(23)sobelstd=FALSE)

                                                                                gt setCordiagram(preacher)

                                                                                Regression Models

                                                                                THERAPY

                                                                                ATTRIB

                                                                                SATIS

                                                                                043

                                                                                04

                                                                                021

                                                                                Figure 17 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure

                                                                                43

                                                                                for speed The default number of boot straps is 5000

                                                                                53 Set Correlation

                                                                                An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor

                                                                                function Set correlation is

                                                                                R2 = 1minusn

                                                                                prodi=1

                                                                                (1minusλi)

                                                                                where λi is the ith eigen value of the eigen value decomposition of the matrix

                                                                                R = Rminus1xx RxyRminus1

                                                                                xx Rminus1xy

                                                                                Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate

                                                                                setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data

                                                                                Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things tonotice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied

                                                                                For this example the analysis is done on the correlation matrix rather than the rawdata

                                                                                gt C lt- cov(satactuse=pairwise)

                                                                                gt model1 lt- lm(ACT~ gender + education + age data=satact)

                                                                                gt summary(model1)

                                                                                Call

                                                                                lm(formula = ACT ~ gender + education + age data = satact)

                                                                                Residuals

                                                                                44

                                                                                Call mediate(y = c(SATQ) x = c(ACT) m = education data = satact

                                                                                mod = gender niter = 50 std = TRUE)

                                                                                The DV (Y) was SATQ The IV (X) was ACT gender ACTXgndr The mediating variable(s) = education

                                                                                Total Direct effect(c) of ACT on SATQ = 058 SE = 003 t direct = 1925 with probability = 0

                                                                                Direct effect (c) of ACT on SATQ removing education = 059 SE = 003 t direct = 1926 with probability = 0

                                                                                Indirect effect (ab) of ACT on SATQ through education = -001

                                                                                Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -002 Upper CI = 0

                                                                                Total Direct effect(c) of gender on SATQ = -014 SE = 003 t direct = -478 with probability = 21e-06

                                                                                Direct effect (c) of gender on NA removing education = -014 SE = 003 t direct = -463 with probability = 44e-06

                                                                                Indirect effect (ab) of gender on SATQ through education = 0

                                                                                Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -001 Upper CI = 0

                                                                                Total Direct effect(c) of ACTXgndr on SATQ = 0 SE = 003 t direct = 002 with probability = 099

                                                                                Direct effect (c) of ACTXgndr on NA removing education = 0 SE = 003 t direct = 001 with probability = 099

                                                                                Indirect effect (ab) of ACTXgndr on SATQ through education = 0

                                                                                Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = 0 Upper CI = 0

                                                                                R2 of model = 037

                                                                                To see the longer output specify short = FALSE in the print statement

                                                                                Full output

                                                                                Total effect estimates (c)

                                                                                SATQ se t Prob

                                                                                ACT 058 003 1925 000e+00

                                                                                gender -014 003 -478 210e-06

                                                                                ACTXgndr 000 003 002 985e-01

                                                                                Direct effect estimates (c)SATQ se t Prob

                                                                                ACT 059 003 1926 000e+00

                                                                                gender -014 003 -463 437e-06

                                                                                ACTXgndr 000 003 001 992e-01

                                                                                a effect estimates

                                                                                education se t Prob

                                                                                ACT 016 004 422 277e-05

                                                                                gender 009 004 250 128e-02

                                                                                ACTXgndr -001 004 -015 883e-01

                                                                                b effect estimates

                                                                                SATQ se t Prob

                                                                                education -004 003 -145 0147

                                                                                ab effect estimates

                                                                                SATQ boot sd lower upper

                                                                                ACT -001 -001 001 0 0

                                                                                gender 000 000 000 0 0

                                                                                ACTXgndr 000 000 000 0 0

                                                                                Moderation model

                                                                                ACT

                                                                                gender

                                                                                ACTXgndr

                                                                                SATQ

                                                                                education016 c = 058

                                                                                c = 059

                                                                                009 c = minus014

                                                                                c = minus014

                                                                                minus001 c = 0

                                                                                c = 0

                                                                                minus004

                                                                                minus004

                                                                                minus007

                                                                                002

                                                                                Figure 18 Moderated multiple regression requires the raw data

                                                                                45

                                                                                Min 1Q Median 3Q Max

                                                                                -252458 -32133 07769 35921 92630

                                                                                Coefficients

                                                                                Estimate Std Error t value Pr(gt|t|)

                                                                                (Intercept) 2741706 082140 33378 lt 2e-16

                                                                                gender -048606 037984 -1280 020110

                                                                                education 047890 015235 3143 000174

                                                                                age 001623 002278 0712 047650

                                                                                ---

                                                                                Signif codes 0 0001 001 005 01 1

                                                                                Residual standard error 4768 on 696 degrees of freedom

                                                                                Multiple R-squared 00272 Adjusted R-squared 002301

                                                                                F-statistic 6487 on 3 and 696 DF p-value 00002476

                                                                                Compare this with the output from setCor

                                                                                gt compare with sector

                                                                                gt setCor(c(46)c(13)C nobs=700)

                                                                                Call setCor(y = c(46) x = c(13) data = C nobs = 700)

                                                                                Multiple Regression from matrix input

                                                                                Beta weights

                                                                                ACT SATV SATQ

                                                                                gender -005 -003 -018

                                                                                education 014 010 010

                                                                                age 003 -010 -009

                                                                                Multiple R

                                                                                ACT SATV SATQ

                                                                                016 010 019

                                                                                multiple R2

                                                                                ACT SATV SATQ

                                                                                00272 00096 00359

                                                                                Multiple Inflation Factor (VIF) = 1(1-SMC) =

                                                                                gender education age

                                                                                101 145 144

                                                                                Unweighted multiple R

                                                                                ACT SATV SATQ

                                                                                015 005 011

                                                                                Unweighted multiple R2

                                                                                ACT SATV SATQ

                                                                                002 000 001

                                                                                SE of Beta weights

                                                                                ACT SATV SATQ

                                                                                gender 018 429 434

                                                                                education 022 513 518

                                                                                age 022 511 516

                                                                                t of Beta Weights

                                                                                ACT SATV SATQ

                                                                                gender -027 -001 -004

                                                                                education 065 002 002

                                                                                46

                                                                                age 015 -002 -002

                                                                                Probability of t lt

                                                                                ACT SATV SATQ

                                                                                gender 079 099 097

                                                                                education 051 098 098

                                                                                age 088 098 099

                                                                                Shrunken R2

                                                                                ACT SATV SATQ

                                                                                00230 00054 00317

                                                                                Standard Error of R2

                                                                                ACT SATV SATQ

                                                                                00120 00073 00137

                                                                                F

                                                                                ACT SATV SATQ

                                                                                649 226 863

                                                                                Probability of F lt

                                                                                ACT SATV SATQ

                                                                                248e-04 808e-02 124e-05

                                                                                degrees of freedom of regression

                                                                                [1] 3 696

                                                                                Various estimates of between set correlations

                                                                                Squared Canonical Correlations

                                                                                [1] 0050 0033 0008

                                                                                Chisq of canonical correlations

                                                                                [1] 358 231 56

                                                                                Average squared canonical correlation = 003

                                                                                Cohens Set Correlation R2 = 009

                                                                                Shrunken Set Correlation R2 = 008

                                                                                F and df of Cohens Set Correlation 726 9 168186

                                                                                Unweighted correlation between the two sets = 001

                                                                                Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

                                                                                6 Converting output to APA style tables using LATEX

                                                                                Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenient

                                                                                47

                                                                                LATEXoutput and finally df2latex converts a generic data frame to LATEX

                                                                                An example of converting the output from fa to LATEXappears in Table 2

                                                                                Table 2 fa2latexA factor analysis table from the psych package in R

                                                                                Variable MR1 MR2 MR3 h2 u2 com

                                                                                Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

                                                                                SS loadings 264 186 15

                                                                                MR1 100 059 054MR2 059 100 052MR3 054 052 100

                                                                                48

                                                                                7 Miscellaneous functions

                                                                                A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

                                                                                blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

                                                                                df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

                                                                                scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

                                                                                cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

                                                                                cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

                                                                                dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

                                                                                fisherz Convert a correlation to the corresponding Fisher z score

                                                                                geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

                                                                                ICC and cohenkappa are typically used to find the reliability for raters

                                                                                headtail combines the head and tail functions to show the first and last lines of a dataset or output

                                                                                topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

                                                                                mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

                                                                                prep finds the probability of replication for an F t or r and estimate effect size

                                                                                partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

                                                                                rangeCorrection will correct correlations for restriction of range

                                                                                reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

                                                                                49

                                                                                superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

                                                                                8 Data sets

                                                                                A number of data sets for demonstrating psychometric techniques are included in thepsych package These include six data sets showing a hierarchical factor structure (fivecognitive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2and one from health psychology Reise) One of these (Thurstone) is used as an examplein the sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) or 13 personality inventory scores (epibfi) and14 multiple choice iq items (iqitems) The vegetables example has paired comparisonpreferences for 9 vegetables This is an example of Thurstonian scaling used by Guilford(1954) and Nunnally (1967) Other data sets include cubits peas and heights fromGalton

                                                                                Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

                                                                                bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

                                                                                satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

                                                                                epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

                                                                                50

                                                                                iq 14 multiple choice ability items were included as part of the Synthetic Aperture Person-ality Assessment (SAPA) web based personality assessment project The data from1000 subjects are included here as a demonstration set for scoring multiple choiceinventories and doing basic item statistics

                                                                                galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to ntroduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

                                                                                Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

                                                                                miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

                                                                                9 Development version and a users guide

                                                                                The most recent development version is available as a source file at the repository main-tained at httppersonality-projectorgr That version will have removed the mostrecently discovered bugs (but perhaps introduced other yet to be discovered ones) Todownload that version go to the repository httppersonality-projectorgrsrc

                                                                                contrib and wander around For a Mac this version can be installed directly using theldquoother repositoryrdquo option in the package installer For a PC the zip file for the most recentrelease has been created using the win-builder facility at CRAN The development releasefor the Mac is usually several weeks ahead of the PC development version

                                                                                Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httppersonality-projectorgrpsych_manualpdf

                                                                                News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

                                                                                gt news(Version gt 170package=psych)

                                                                                51

                                                                                10 Psychometric Theory

                                                                                The psych package has been developed to help psychologists do basic research Many ofthe functions were developed to supplement a book (httppersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep)More information about the use of some of the functions may be found in the book

                                                                                For more extensive discussion of the use of psych in particular and R in general consulthttppersonality-projectorgrrguidehtml A short guide to R

                                                                                11 SessionInfo

                                                                                This document was prepared using the following settings

                                                                                gt sessionInfo()

                                                                                R Under development (unstable) (2017-03-05 r72309)

                                                                                Platform x86_64-apple-darwin1340 (64-bit)

                                                                                Running under macOS Sierra 10124

                                                                                Matrix products default

                                                                                BLAS LibraryFrameworksRframeworkVersions34ResourcesliblibRblas0dylib

                                                                                LAPACK LibraryFrameworksRframeworkVersions34ResourcesliblibRlapackdylib

                                                                                locale

                                                                                [1] C

                                                                                attached base packages

                                                                                [1] stats graphics grDevices utils datasets methods base

                                                                                other attached packages

                                                                                [1] psych_17421

                                                                                loaded via a namespace (and not attached)

                                                                                [1] compiler_340 parallel_340 tools_340 foreign_08-67

                                                                                [5] KernSmooth_223-15 nlme_31-131 mnormt_15-4 grid_340

                                                                                [9] lattice_020-34

                                                                                52

                                                                                References

                                                                                Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

                                                                                Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

                                                                                Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

                                                                                Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

                                                                                Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

                                                                                Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

                                                                                Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

                                                                                Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

                                                                                Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

                                                                                Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

                                                                                Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

                                                                                Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

                                                                                Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

                                                                                Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

                                                                                Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

                                                                                53

                                                                                Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

                                                                                Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

                                                                                Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

                                                                                Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

                                                                                Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

                                                                                Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

                                                                                Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

                                                                                Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

                                                                                Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

                                                                                Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

                                                                                MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

                                                                                Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

                                                                                McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

                                                                                Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

                                                                                Nunnally J C (1967) Psychometric theory McGraw-Hill New York

                                                                                54

                                                                                Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

                                                                                3rd edition

                                                                                Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

                                                                                Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

                                                                                Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

                                                                                Revelle W (2015) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 158

                                                                                Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

                                                                                Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

                                                                                Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

                                                                                Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

                                                                                Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook ofIndividual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

                                                                                Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

                                                                                Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

                                                                                Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

                                                                                Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bang

                                                                                55

                                                                                for the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

                                                                                Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

                                                                                Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

                                                                                Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

                                                                                Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

                                                                                Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

                                                                                Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

                                                                                Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

                                                                                Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

                                                                                Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

                                                                                Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

                                                                                Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

                                                                                56

                                                                                Index

                                                                                affect 14 24alpha 5 6

                                                                                Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26bifactor 6biserial 13 34blockrandom 49burt 34

                                                                                char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49circular statistics 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49ctv 7cubits 50

                                                                                densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49diagram 8drawcor 34drawtetra 34dummycode 13

                                                                                dynamite plot 19

                                                                                edit 3epibfi 50error bars 19errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23

                                                                                fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factor analysis 6factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49

                                                                                galton 51generalized least squares 6geometricmean 49GPArotation 7guttman 6

                                                                                harmonicmean 49head 49headtail 49heights 50hetdiagram 7Hmisc 28Holzinger 50

                                                                                57

                                                                                ICC 6 49iclust 6iclustdiagram 7Index 49introduction to psychometric theory with ap-

                                                                                plications in R 7iqitems 50irtfa 6 47irt2latex 47 49

                                                                                KnitR 47

                                                                                lavaan 38library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27lowess 14

                                                                                makekeys 14MAP 6mardia 49maximum likelihood 6mediate 4 41 42mediatediagram 41minimum residual 6mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevel 37multilevelreliability 6multiple regression 38

                                                                                nfactors 6nlme 37

                                                                                omega 6 7outlier 3 11 12

                                                                                padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7principal axis 6psych 3 5ndash8 28 47 49ndash52

                                                                                R functionaffect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50

                                                                                58

                                                                                densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13edit 3epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49head 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47

                                                                                irt2latex 47 49library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7psych 51psych package

                                                                                affect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26

                                                                                59

                                                                                biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49corsmooth 34cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7

                                                                                fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47irt2latex 47 49lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12prep 49pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34

                                                                                60

                                                                                polyserial 34principal 5ndash7psych 51rtest 28rangeCorrection 49readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

                                                                                rtest 28

                                                                                rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50table 19tail 49tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

                                                                                R package

                                                                                61

                                                                                ctv 7GPArotation 7Hmisc 28KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash8 28 47 49ndash52Rgraphviz 8sem 7 50stats 28Sweave 47xtable 47

                                                                                rtest 28rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49Rgraphviz 8

                                                                                SAPA 26 50 51satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43sem 7 50set correlation 44setcor 49setCor 4 38 41 44 46 47simmultilevel 37

                                                                                spider 13stars 14stats 28StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50Sweave 47

                                                                                table 19tail 49tetrachoric 6 34Thurstone 28 38 50Thurstone33 50topBottom 49

                                                                                vegetables 50 51violinBy 14 18vss 5 6

                                                                                weighted least squares 6withinBetween 37

                                                                                xtable 47

                                                                                62

                                                                                • Jump starting the psych packagendasha guide for the impatient
                                                                                • Psychometric functions are summarized in the second vignette
                                                                                • Overview of this and related documents
                                                                                • Getting started
                                                                                • Basic data analysis
                                                                                  • Getting the data by using readfile
                                                                                  • Data input from the clipboard
                                                                                  • Basic descriptive statistics
                                                                                    • Outlier detection using outlier
                                                                                    • Basic data cleaning using scrub
                                                                                    • Recoding categorical variables into dummy coded variables
                                                                                      • Simple descriptive graphics
                                                                                        • Scatter Plot Matrices
                                                                                        • Density or violin plots
                                                                                        • Means and error bars
                                                                                        • Error bars for tabular data
                                                                                        • Two dimensional displays of means and errors
                                                                                        • Back to back histograms
                                                                                        • Correlational structure
                                                                                        • Heatmap displays of correlational structure
                                                                                          • Testing correlations
                                                                                          • Polychoric tetrachoric polyserial and biserial correlations
                                                                                            • Multilevel modeling
                                                                                              • Decomposing data into within and between level correlations using statsBy
                                                                                              • Generating and displaying multilevel data
                                                                                              • Factor analysis by groups
                                                                                                • Multiple Regression mediation moderation and set correlations
                                                                                                  • Multiple regression from data or correlation matrices
                                                                                                  • Mediation and Moderation analysis
                                                                                                  • Set Correlation
                                                                                                    • Converting output to APA style tables using LaTeX
                                                                                                    • Miscellaneous functions
                                                                                                    • Data sets
                                                                                                    • Development version and a users guide
                                                                                                    • Psychometric Theory
                                                                                                    • SessionInfo

                                                                                  Consider the example from Preacher and Hayes (2004) as analyzed using the mediate

                                                                                  function and the subsequent graphic from mediatediagram The data are found in theexample for mediate

                                                                                  Call mediate(y = SATIS x = THERAPY m = ATTRIB data = sobel)

                                                                                  The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB

                                                                                  Total Direct effect(c) of THERAPY on SATIS = 076 SE = 031 t direct = 25 with probability = 0019

                                                                                  Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t direct = 135 with probability = 019

                                                                                  Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033

                                                                                  Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 004 Upper CI = 069

                                                                                  R2 of model = 031

                                                                                  To see the longer output specify short = FALSE in the print statement

                                                                                  Full output

                                                                                  Total effect estimates (c)

                                                                                  SATIS se t Prob

                                                                                  THERAPY 076 031 25 00186

                                                                                  Direct effect estimates (c)SATIS se t Prob

                                                                                  THERAPY 043 032 135 0190

                                                                                  ATTRIB 040 018 223 0034

                                                                                  a effect estimates

                                                                                  THERAPY se t Prob

                                                                                  ATTRIB 082 03 274 00106

                                                                                  b effect estimates

                                                                                  SATIS se t Prob

                                                                                  ATTRIB 04 018 223 0034

                                                                                  ab effect estimates

                                                                                  SATIS boot sd lower upper

                                                                                  THERAPY 033 032 017 004 069

                                                                                  bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables

                                                                                  setCor(y = c( SATV SATQ) x = c(education age ) data = satact std=TRUE)

                                                                                  bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap

                                                                                  mediate(y = c( SATV ) x = c(education age ) m= ACT data =satactstd=TRUEniter=50)

                                                                                  bull mediate will take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It then tests the mediation effect using a bootstrap The particular example is for demonstration purposes only and shows neithermoderation nor mediation The number of iterations for the boot strap was set to 50

                                                                                  41

                                                                                  gt mediatediagram(preacher)

                                                                                  Mediation model

                                                                                  THERAPY SATIS

                                                                                  ATTRIB

                                                                                  082

                                                                                  c = 076

                                                                                  c = 043

                                                                                  04

                                                                                  Figure 16 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram

                                                                                  42

                                                                                  gt preacher lt- setCor(1c(23)sobelstd=FALSE)

                                                                                  gt setCordiagram(preacher)

                                                                                  Regression Models

                                                                                  THERAPY

                                                                                  ATTRIB

                                                                                  SATIS

                                                                                  043

                                                                                  04

                                                                                  021

                                                                                  Figure 17 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure

                                                                                  43

                                                                                  for speed The default number of boot straps is 5000

                                                                                  53 Set Correlation

                                                                                  An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor

                                                                                  function Set correlation is

                                                                                  R2 = 1minusn

                                                                                  prodi=1

                                                                                  (1minusλi)

                                                                                  where λi is the ith eigen value of the eigen value decomposition of the matrix

                                                                                  R = Rminus1xx RxyRminus1

                                                                                  xx Rminus1xy

                                                                                  Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate

                                                                                  setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data

                                                                                  Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things tonotice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied

                                                                                  For this example the analysis is done on the correlation matrix rather than the rawdata

                                                                                  gt C lt- cov(satactuse=pairwise)

                                                                                  gt model1 lt- lm(ACT~ gender + education + age data=satact)

                                                                                  gt summary(model1)

                                                                                  Call

                                                                                  lm(formula = ACT ~ gender + education + age data = satact)

                                                                                  Residuals

                                                                                  44

                                                                                  Call mediate(y = c(SATQ) x = c(ACT) m = education data = satact

                                                                                  mod = gender niter = 50 std = TRUE)

                                                                                  The DV (Y) was SATQ The IV (X) was ACT gender ACTXgndr The mediating variable(s) = education

                                                                                  Total Direct effect(c) of ACT on SATQ = 058 SE = 003 t direct = 1925 with probability = 0

                                                                                  Direct effect (c) of ACT on SATQ removing education = 059 SE = 003 t direct = 1926 with probability = 0

                                                                                  Indirect effect (ab) of ACT on SATQ through education = -001

                                                                                  Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -002 Upper CI = 0

                                                                                  Total Direct effect(c) of gender on SATQ = -014 SE = 003 t direct = -478 with probability = 21e-06

                                                                                  Direct effect (c) of gender on NA removing education = -014 SE = 003 t direct = -463 with probability = 44e-06

                                                                                  Indirect effect (ab) of gender on SATQ through education = 0

                                                                                  Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -001 Upper CI = 0

                                                                                  Total Direct effect(c) of ACTXgndr on SATQ = 0 SE = 003 t direct = 002 with probability = 099

                                                                                  Direct effect (c) of ACTXgndr on NA removing education = 0 SE = 003 t direct = 001 with probability = 099

                                                                                  Indirect effect (ab) of ACTXgndr on SATQ through education = 0

                                                                                  Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = 0 Upper CI = 0

                                                                                  R2 of model = 037

                                                                                  To see the longer output specify short = FALSE in the print statement

                                                                                  Full output

                                                                                  Total effect estimates (c)

                                                                                  SATQ se t Prob

                                                                                  ACT 058 003 1925 000e+00

                                                                                  gender -014 003 -478 210e-06

                                                                                  ACTXgndr 000 003 002 985e-01

                                                                                  Direct effect estimates (c)SATQ se t Prob

                                                                                  ACT 059 003 1926 000e+00

                                                                                  gender -014 003 -463 437e-06

                                                                                  ACTXgndr 000 003 001 992e-01

                                                                                  a effect estimates

                                                                                  education se t Prob

                                                                                  ACT 016 004 422 277e-05

                                                                                  gender 009 004 250 128e-02

                                                                                  ACTXgndr -001 004 -015 883e-01

                                                                                  b effect estimates

                                                                                  SATQ se t Prob

                                                                                  education -004 003 -145 0147

                                                                                  ab effect estimates

                                                                                  SATQ boot sd lower upper

                                                                                  ACT -001 -001 001 0 0

                                                                                  gender 000 000 000 0 0

                                                                                  ACTXgndr 000 000 000 0 0

                                                                                  Moderation model

                                                                                  ACT

                                                                                  gender

                                                                                  ACTXgndr

                                                                                  SATQ

                                                                                  education016 c = 058

                                                                                  c = 059

                                                                                  009 c = minus014

                                                                                  c = minus014

                                                                                  minus001 c = 0

                                                                                  c = 0

                                                                                  minus004

                                                                                  minus004

                                                                                  minus007

                                                                                  002

                                                                                  Figure 18 Moderated multiple regression requires the raw data

                                                                                  45

                                                                                  Min 1Q Median 3Q Max

                                                                                  -252458 -32133 07769 35921 92630

                                                                                  Coefficients

                                                                                  Estimate Std Error t value Pr(gt|t|)

                                                                                  (Intercept) 2741706 082140 33378 lt 2e-16

                                                                                  gender -048606 037984 -1280 020110

                                                                                  education 047890 015235 3143 000174

                                                                                  age 001623 002278 0712 047650

                                                                                  ---

                                                                                  Signif codes 0 0001 001 005 01 1

                                                                                  Residual standard error 4768 on 696 degrees of freedom

                                                                                  Multiple R-squared 00272 Adjusted R-squared 002301

                                                                                  F-statistic 6487 on 3 and 696 DF p-value 00002476

                                                                                  Compare this with the output from setCor

                                                                                  gt compare with sector

                                                                                  gt setCor(c(46)c(13)C nobs=700)

                                                                                  Call setCor(y = c(46) x = c(13) data = C nobs = 700)

                                                                                  Multiple Regression from matrix input

                                                                                  Beta weights

                                                                                  ACT SATV SATQ

                                                                                  gender -005 -003 -018

                                                                                  education 014 010 010

                                                                                  age 003 -010 -009

                                                                                  Multiple R

                                                                                  ACT SATV SATQ

                                                                                  016 010 019

                                                                                  multiple R2

                                                                                  ACT SATV SATQ

                                                                                  00272 00096 00359

                                                                                  Multiple Inflation Factor (VIF) = 1(1-SMC) =

                                                                                  gender education age

                                                                                  101 145 144

                                                                                  Unweighted multiple R

                                                                                  ACT SATV SATQ

                                                                                  015 005 011

                                                                                  Unweighted multiple R2

                                                                                  ACT SATV SATQ

                                                                                  002 000 001

                                                                                  SE of Beta weights

                                                                                  ACT SATV SATQ

                                                                                  gender 018 429 434

                                                                                  education 022 513 518

                                                                                  age 022 511 516

                                                                                  t of Beta Weights

                                                                                  ACT SATV SATQ

                                                                                  gender -027 -001 -004

                                                                                  education 065 002 002

                                                                                  46

                                                                                  age 015 -002 -002

                                                                                  Probability of t lt

                                                                                  ACT SATV SATQ

                                                                                  gender 079 099 097

                                                                                  education 051 098 098

                                                                                  age 088 098 099

                                                                                  Shrunken R2

                                                                                  ACT SATV SATQ

                                                                                  00230 00054 00317

                                                                                  Standard Error of R2

                                                                                  ACT SATV SATQ

                                                                                  00120 00073 00137

                                                                                  F

                                                                                  ACT SATV SATQ

                                                                                  649 226 863

                                                                                  Probability of F lt

                                                                                  ACT SATV SATQ

                                                                                  248e-04 808e-02 124e-05

                                                                                  degrees of freedom of regression

                                                                                  [1] 3 696

                                                                                  Various estimates of between set correlations

                                                                                  Squared Canonical Correlations

                                                                                  [1] 0050 0033 0008

                                                                                  Chisq of canonical correlations

                                                                                  [1] 358 231 56

                                                                                  Average squared canonical correlation = 003

                                                                                  Cohens Set Correlation R2 = 009

                                                                                  Shrunken Set Correlation R2 = 008

                                                                                  F and df of Cohens Set Correlation 726 9 168186

                                                                                  Unweighted correlation between the two sets = 001

                                                                                  Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

                                                                                  6 Converting output to APA style tables using LATEX

                                                                                  Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenient

                                                                                  47

                                                                                  LATEXoutput and finally df2latex converts a generic data frame to LATEX

                                                                                  An example of converting the output from fa to LATEXappears in Table 2

                                                                                  Table 2 fa2latexA factor analysis table from the psych package in R

                                                                                  Variable MR1 MR2 MR3 h2 u2 com

                                                                                  Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

                                                                                  SS loadings 264 186 15

                                                                                  MR1 100 059 054MR2 059 100 052MR3 054 052 100

                                                                                  48

                                                                                  7 Miscellaneous functions

                                                                                  A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

                                                                                  blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

                                                                                  df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

                                                                                  scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

                                                                                  cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

                                                                                  cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

                                                                                  dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

                                                                                  fisherz Convert a correlation to the corresponding Fisher z score

                                                                                  geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

                                                                                  ICC and cohenkappa are typically used to find the reliability for raters

                                                                                  headtail combines the head and tail functions to show the first and last lines of a dataset or output

                                                                                  topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

                                                                                  mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

                                                                                  prep finds the probability of replication for an F t or r and estimate effect size

                                                                                  partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

                                                                                  rangeCorrection will correct correlations for restriction of range

                                                                                  reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

                                                                                  49

                                                                                  superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

                                                                                  8 Data sets

                                                                                  A number of data sets for demonstrating psychometric techniques are included in thepsych package These include six data sets showing a hierarchical factor structure (fivecognitive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2and one from health psychology Reise) One of these (Thurstone) is used as an examplein the sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) or 13 personality inventory scores (epibfi) and14 multiple choice iq items (iqitems) The vegetables example has paired comparisonpreferences for 9 vegetables This is an example of Thurstonian scaling used by Guilford(1954) and Nunnally (1967) Other data sets include cubits peas and heights fromGalton

                                                                                  Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

                                                                                  bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

                                                                                  satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

                                                                                  epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

                                                                                  50

                                                                                  iq 14 multiple choice ability items were included as part of the Synthetic Aperture Person-ality Assessment (SAPA) web based personality assessment project The data from1000 subjects are included here as a demonstration set for scoring multiple choiceinventories and doing basic item statistics

                                                                                  galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to ntroduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

                                                                                  Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

                                                                                  miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

                                                                                  9 Development version and a users guide

                                                                                  The most recent development version is available as a source file at the repository main-tained at httppersonality-projectorgr That version will have removed the mostrecently discovered bugs (but perhaps introduced other yet to be discovered ones) Todownload that version go to the repository httppersonality-projectorgrsrc

                                                                                  contrib and wander around For a Mac this version can be installed directly using theldquoother repositoryrdquo option in the package installer For a PC the zip file for the most recentrelease has been created using the win-builder facility at CRAN The development releasefor the Mac is usually several weeks ahead of the PC development version

                                                                                  Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httppersonality-projectorgrpsych_manualpdf

                                                                                  News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

                                                                                  gt news(Version gt 170package=psych)

                                                                                  51

                                                                                  10 Psychometric Theory

                                                                                  The psych package has been developed to help psychologists do basic research Many ofthe functions were developed to supplement a book (httppersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep)More information about the use of some of the functions may be found in the book

                                                                                  For more extensive discussion of the use of psych in particular and R in general consulthttppersonality-projectorgrrguidehtml A short guide to R

                                                                                  11 SessionInfo

                                                                                  This document was prepared using the following settings

                                                                                  gt sessionInfo()

                                                                                  R Under development (unstable) (2017-03-05 r72309)

                                                                                  Platform x86_64-apple-darwin1340 (64-bit)

                                                                                  Running under macOS Sierra 10124

                                                                                  Matrix products default

                                                                                  BLAS LibraryFrameworksRframeworkVersions34ResourcesliblibRblas0dylib

                                                                                  LAPACK LibraryFrameworksRframeworkVersions34ResourcesliblibRlapackdylib

                                                                                  locale

                                                                                  [1] C

                                                                                  attached base packages

                                                                                  [1] stats graphics grDevices utils datasets methods base

                                                                                  other attached packages

                                                                                  [1] psych_17421

                                                                                  loaded via a namespace (and not attached)

                                                                                  [1] compiler_340 parallel_340 tools_340 foreign_08-67

                                                                                  [5] KernSmooth_223-15 nlme_31-131 mnormt_15-4 grid_340

                                                                                  [9] lattice_020-34

                                                                                  52

                                                                                  References

                                                                                  Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

                                                                                  Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

                                                                                  Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

                                                                                  Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

                                                                                  Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

                                                                                  Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

                                                                                  Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

                                                                                  Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

                                                                                  Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

                                                                                  Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

                                                                                  Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

                                                                                  Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

                                                                                  Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

                                                                                  Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

                                                                                  Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

                                                                                  53

                                                                                  Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

                                                                                  Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

                                                                                  Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

                                                                                  Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

                                                                                  Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

                                                                                  Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

                                                                                  Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

                                                                                  Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

                                                                                  Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

                                                                                  Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

                                                                                  MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

                                                                                  Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

                                                                                  McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

                                                                                  Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

                                                                                  Nunnally J C (1967) Psychometric theory McGraw-Hill New York

                                                                                  54

                                                                                  Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

                                                                                  3rd edition

                                                                                  Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

                                                                                  Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

                                                                                  Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

                                                                                  Revelle W (2015) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 158

                                                                                  Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

                                                                                  Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

                                                                                  Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

                                                                                  Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

                                                                                  Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook ofIndividual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

                                                                                  Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

                                                                                  Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

                                                                                  Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

                                                                                  Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bang

                                                                                  55

                                                                                  for the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

                                                                                  Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

                                                                                  Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

                                                                                  Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

                                                                                  Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

                                                                                  Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

                                                                                  Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

                                                                                  Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

                                                                                  Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

                                                                                  Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

                                                                                  Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

                                                                                  Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

                                                                                  56

                                                                                  Index

                                                                                  affect 14 24alpha 5 6

                                                                                  Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26bifactor 6biserial 13 34blockrandom 49burt 34

                                                                                  char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49circular statistics 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49ctv 7cubits 50

                                                                                  densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49diagram 8drawcor 34drawtetra 34dummycode 13

                                                                                  dynamite plot 19

                                                                                  edit 3epibfi 50error bars 19errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23

                                                                                  fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factor analysis 6factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49

                                                                                  galton 51generalized least squares 6geometricmean 49GPArotation 7guttman 6

                                                                                  harmonicmean 49head 49headtail 49heights 50hetdiagram 7Hmisc 28Holzinger 50

                                                                                  57

                                                                                  ICC 6 49iclust 6iclustdiagram 7Index 49introduction to psychometric theory with ap-

                                                                                  plications in R 7iqitems 50irtfa 6 47irt2latex 47 49

                                                                                  KnitR 47

                                                                                  lavaan 38library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27lowess 14

                                                                                  makekeys 14MAP 6mardia 49maximum likelihood 6mediate 4 41 42mediatediagram 41minimum residual 6mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevel 37multilevelreliability 6multiple regression 38

                                                                                  nfactors 6nlme 37

                                                                                  omega 6 7outlier 3 11 12

                                                                                  padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7principal axis 6psych 3 5ndash8 28 47 49ndash52

                                                                                  R functionaffect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50

                                                                                  58

                                                                                  densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13edit 3epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49head 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47

                                                                                  irt2latex 47 49library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7psych 51psych package

                                                                                  affect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26

                                                                                  59

                                                                                  biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49corsmooth 34cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7

                                                                                  fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47irt2latex 47 49lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12prep 49pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34

                                                                                  60

                                                                                  polyserial 34principal 5ndash7psych 51rtest 28rangeCorrection 49readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

                                                                                  rtest 28

                                                                                  rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50table 19tail 49tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

                                                                                  R package

                                                                                  61

                                                                                  ctv 7GPArotation 7Hmisc 28KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash8 28 47 49ndash52Rgraphviz 8sem 7 50stats 28Sweave 47xtable 47

                                                                                  rtest 28rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49Rgraphviz 8

                                                                                  SAPA 26 50 51satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43sem 7 50set correlation 44setcor 49setCor 4 38 41 44 46 47simmultilevel 37

                                                                                  spider 13stars 14stats 28StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50Sweave 47

                                                                                  table 19tail 49tetrachoric 6 34Thurstone 28 38 50Thurstone33 50topBottom 49

                                                                                  vegetables 50 51violinBy 14 18vss 5 6

                                                                                  weighted least squares 6withinBetween 37

                                                                                  xtable 47

                                                                                  62

                                                                                  • Jump starting the psych packagendasha guide for the impatient
                                                                                  • Psychometric functions are summarized in the second vignette
                                                                                  • Overview of this and related documents
                                                                                  • Getting started
                                                                                  • Basic data analysis
                                                                                    • Getting the data by using readfile
                                                                                    • Data input from the clipboard
                                                                                    • Basic descriptive statistics
                                                                                      • Outlier detection using outlier
                                                                                      • Basic data cleaning using scrub
                                                                                      • Recoding categorical variables into dummy coded variables
                                                                                        • Simple descriptive graphics
                                                                                          • Scatter Plot Matrices
                                                                                          • Density or violin plots
                                                                                          • Means and error bars
                                                                                          • Error bars for tabular data
                                                                                          • Two dimensional displays of means and errors
                                                                                          • Back to back histograms
                                                                                          • Correlational structure
                                                                                          • Heatmap displays of correlational structure
                                                                                            • Testing correlations
                                                                                            • Polychoric tetrachoric polyserial and biserial correlations
                                                                                              • Multilevel modeling
                                                                                                • Decomposing data into within and between level correlations using statsBy
                                                                                                • Generating and displaying multilevel data
                                                                                                • Factor analysis by groups
                                                                                                  • Multiple Regression mediation moderation and set correlations
                                                                                                    • Multiple regression from data or correlation matrices
                                                                                                    • Mediation and Moderation analysis
                                                                                                    • Set Correlation
                                                                                                      • Converting output to APA style tables using LaTeX
                                                                                                      • Miscellaneous functions
                                                                                                      • Data sets
                                                                                                      • Development version and a users guide
                                                                                                      • Psychometric Theory
                                                                                                      • SessionInfo

                                                                                    gt mediatediagram(preacher)

                                                                                    Mediation model

                                                                                    THERAPY SATIS

                                                                                    ATTRIB

                                                                                    082

                                                                                    c = 076

                                                                                    c = 043

                                                                                    04

                                                                                    Figure 16 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram

                                                                                    42

                                                                                    gt preacher lt- setCor(1c(23)sobelstd=FALSE)

                                                                                    gt setCordiagram(preacher)

                                                                                    Regression Models

                                                                                    THERAPY

                                                                                    ATTRIB

                                                                                    SATIS

                                                                                    043

                                                                                    04

                                                                                    021

                                                                                    Figure 17 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure

                                                                                    43

                                                                                    for speed The default number of boot straps is 5000

                                                                                    53 Set Correlation

                                                                                    An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor

                                                                                    function Set correlation is

                                                                                    R2 = 1minusn

                                                                                    prodi=1

                                                                                    (1minusλi)

                                                                                    where λi is the ith eigen value of the eigen value decomposition of the matrix

                                                                                    R = Rminus1xx RxyRminus1

                                                                                    xx Rminus1xy

                                                                                    Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate

                                                                                    setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data

                                                                                    Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things tonotice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied

                                                                                    For this example the analysis is done on the correlation matrix rather than the rawdata

                                                                                    gt C lt- cov(satactuse=pairwise)

                                                                                    gt model1 lt- lm(ACT~ gender + education + age data=satact)

                                                                                    gt summary(model1)

                                                                                    Call

                                                                                    lm(formula = ACT ~ gender + education + age data = satact)

                                                                                    Residuals

                                                                                    44

                                                                                    Call mediate(y = c(SATQ) x = c(ACT) m = education data = satact

                                                                                    mod = gender niter = 50 std = TRUE)

                                                                                    The DV (Y) was SATQ The IV (X) was ACT gender ACTXgndr The mediating variable(s) = education

                                                                                    Total Direct effect(c) of ACT on SATQ = 058 SE = 003 t direct = 1925 with probability = 0

                                                                                    Direct effect (c) of ACT on SATQ removing education = 059 SE = 003 t direct = 1926 with probability = 0

                                                                                    Indirect effect (ab) of ACT on SATQ through education = -001

                                                                                    Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -002 Upper CI = 0

                                                                                    Total Direct effect(c) of gender on SATQ = -014 SE = 003 t direct = -478 with probability = 21e-06

                                                                                    Direct effect (c) of gender on NA removing education = -014 SE = 003 t direct = -463 with probability = 44e-06

                                                                                    Indirect effect (ab) of gender on SATQ through education = 0

                                                                                    Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -001 Upper CI = 0

                                                                                    Total Direct effect(c) of ACTXgndr on SATQ = 0 SE = 003 t direct = 002 with probability = 099

                                                                                    Direct effect (c) of ACTXgndr on NA removing education = 0 SE = 003 t direct = 001 with probability = 099

                                                                                    Indirect effect (ab) of ACTXgndr on SATQ through education = 0

                                                                                    Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = 0 Upper CI = 0

                                                                                    R2 of model = 037

                                                                                    To see the longer output specify short = FALSE in the print statement

                                                                                    Full output

                                                                                    Total effect estimates (c)

                                                                                    SATQ se t Prob

                                                                                    ACT 058 003 1925 000e+00

                                                                                    gender -014 003 -478 210e-06

                                                                                    ACTXgndr 000 003 002 985e-01

                                                                                    Direct effect estimates (c)SATQ se t Prob

                                                                                    ACT 059 003 1926 000e+00

                                                                                    gender -014 003 -463 437e-06

                                                                                    ACTXgndr 000 003 001 992e-01

                                                                                    a effect estimates

                                                                                    education se t Prob

                                                                                    ACT 016 004 422 277e-05

                                                                                    gender 009 004 250 128e-02

                                                                                    ACTXgndr -001 004 -015 883e-01

                                                                                    b effect estimates

                                                                                    SATQ se t Prob

                                                                                    education -004 003 -145 0147

                                                                                    ab effect estimates

                                                                                    SATQ boot sd lower upper

                                                                                    ACT -001 -001 001 0 0

                                                                                    gender 000 000 000 0 0

                                                                                    ACTXgndr 000 000 000 0 0

                                                                                    Moderation model

                                                                                    ACT

                                                                                    gender

                                                                                    ACTXgndr

                                                                                    SATQ

                                                                                    education016 c = 058

                                                                                    c = 059

                                                                                    009 c = minus014

                                                                                    c = minus014

                                                                                    minus001 c = 0

                                                                                    c = 0

                                                                                    minus004

                                                                                    minus004

                                                                                    minus007

                                                                                    002

                                                                                    Figure 18 Moderated multiple regression requires the raw data

                                                                                    45

                                                                                    Min 1Q Median 3Q Max

                                                                                    -252458 -32133 07769 35921 92630

                                                                                    Coefficients

                                                                                    Estimate Std Error t value Pr(gt|t|)

                                                                                    (Intercept) 2741706 082140 33378 lt 2e-16

                                                                                    gender -048606 037984 -1280 020110

                                                                                    education 047890 015235 3143 000174

                                                                                    age 001623 002278 0712 047650

                                                                                    ---

                                                                                    Signif codes 0 0001 001 005 01 1

                                                                                    Residual standard error 4768 on 696 degrees of freedom

                                                                                    Multiple R-squared 00272 Adjusted R-squared 002301

                                                                                    F-statistic 6487 on 3 and 696 DF p-value 00002476

                                                                                    Compare this with the output from setCor

                                                                                    gt compare with sector

                                                                                    gt setCor(c(46)c(13)C nobs=700)

                                                                                    Call setCor(y = c(46) x = c(13) data = C nobs = 700)

                                                                                    Multiple Regression from matrix input

                                                                                    Beta weights

                                                                                    ACT SATV SATQ

                                                                                    gender -005 -003 -018

                                                                                    education 014 010 010

                                                                                    age 003 -010 -009

                                                                                    Multiple R

                                                                                    ACT SATV SATQ

                                                                                    016 010 019

                                                                                    multiple R2

                                                                                    ACT SATV SATQ

                                                                                    00272 00096 00359

                                                                                    Multiple Inflation Factor (VIF) = 1(1-SMC) =

                                                                                    gender education age

                                                                                    101 145 144

                                                                                    Unweighted multiple R

                                                                                    ACT SATV SATQ

                                                                                    015 005 011

                                                                                    Unweighted multiple R2

                                                                                    ACT SATV SATQ

                                                                                    002 000 001

                                                                                    SE of Beta weights

                                                                                    ACT SATV SATQ

                                                                                    gender 018 429 434

                                                                                    education 022 513 518

                                                                                    age 022 511 516

                                                                                    t of Beta Weights

                                                                                    ACT SATV SATQ

                                                                                    gender -027 -001 -004

                                                                                    education 065 002 002

                                                                                    46

                                                                                    age 015 -002 -002

                                                                                    Probability of t lt

                                                                                    ACT SATV SATQ

                                                                                    gender 079 099 097

                                                                                    education 051 098 098

                                                                                    age 088 098 099

                                                                                    Shrunken R2

                                                                                    ACT SATV SATQ

                                                                                    00230 00054 00317

                                                                                    Standard Error of R2

                                                                                    ACT SATV SATQ

                                                                                    00120 00073 00137

                                                                                    F

                                                                                    ACT SATV SATQ

                                                                                    649 226 863

                                                                                    Probability of F lt

                                                                                    ACT SATV SATQ

                                                                                    248e-04 808e-02 124e-05

                                                                                    degrees of freedom of regression

                                                                                    [1] 3 696

                                                                                    Various estimates of between set correlations

                                                                                    Squared Canonical Correlations

                                                                                    [1] 0050 0033 0008

                                                                                    Chisq of canonical correlations

                                                                                    [1] 358 231 56

                                                                                    Average squared canonical correlation = 003

                                                                                    Cohens Set Correlation R2 = 009

                                                                                    Shrunken Set Correlation R2 = 008

                                                                                    F and df of Cohens Set Correlation 726 9 168186

                                                                                    Unweighted correlation between the two sets = 001

                                                                                    Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

                                                                                    6 Converting output to APA style tables using LATEX

                                                                                    Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenient

                                                                                    47

                                                                                    LATEXoutput and finally df2latex converts a generic data frame to LATEX

                                                                                    An example of converting the output from fa to LATEXappears in Table 2

                                                                                    Table 2 fa2latexA factor analysis table from the psych package in R

                                                                                    Variable MR1 MR2 MR3 h2 u2 com

                                                                                    Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

                                                                                    SS loadings 264 186 15

                                                                                    MR1 100 059 054MR2 059 100 052MR3 054 052 100

                                                                                    48

                                                                                    7 Miscellaneous functions

                                                                                    A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

                                                                                    blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

                                                                                    df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

                                                                                    scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

                                                                                    cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

                                                                                    cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

                                                                                    dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

                                                                                    fisherz Convert a correlation to the corresponding Fisher z score

                                                                                    geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

                                                                                    ICC and cohenkappa are typically used to find the reliability for raters

                                                                                    headtail combines the head and tail functions to show the first and last lines of a dataset or output

                                                                                    topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

                                                                                    mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

                                                                                    prep finds the probability of replication for an F t or r and estimate effect size

                                                                                    partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

                                                                                    rangeCorrection will correct correlations for restriction of range

                                                                                    reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

                                                                                    49

                                                                                    superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

                                                                                    8 Data sets

                                                                                    A number of data sets for demonstrating psychometric techniques are included in thepsych package These include six data sets showing a hierarchical factor structure (fivecognitive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2and one from health psychology Reise) One of these (Thurstone) is used as an examplein the sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) or 13 personality inventory scores (epibfi) and14 multiple choice iq items (iqitems) The vegetables example has paired comparisonpreferences for 9 vegetables This is an example of Thurstonian scaling used by Guilford(1954) and Nunnally (1967) Other data sets include cubits peas and heights fromGalton

                                                                                    Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

                                                                                    bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

                                                                                    satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

                                                                                    epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

                                                                                    50

                                                                                    iq 14 multiple choice ability items were included as part of the Synthetic Aperture Person-ality Assessment (SAPA) web based personality assessment project The data from1000 subjects are included here as a demonstration set for scoring multiple choiceinventories and doing basic item statistics

                                                                                    galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to ntroduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

                                                                                    Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

                                                                                    miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

                                                                                    9 Development version and a users guide

                                                                                    The most recent development version is available as a source file at the repository main-tained at httppersonality-projectorgr That version will have removed the mostrecently discovered bugs (but perhaps introduced other yet to be discovered ones) Todownload that version go to the repository httppersonality-projectorgrsrc

                                                                                    contrib and wander around For a Mac this version can be installed directly using theldquoother repositoryrdquo option in the package installer For a PC the zip file for the most recentrelease has been created using the win-builder facility at CRAN The development releasefor the Mac is usually several weeks ahead of the PC development version

                                                                                    Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httppersonality-projectorgrpsych_manualpdf

                                                                                    News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

                                                                                    gt news(Version gt 170package=psych)

                                                                                    51

                                                                                    10 Psychometric Theory

                                                                                    The psych package has been developed to help psychologists do basic research Many ofthe functions were developed to supplement a book (httppersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep)More information about the use of some of the functions may be found in the book

                                                                                    For more extensive discussion of the use of psych in particular and R in general consulthttppersonality-projectorgrrguidehtml A short guide to R

                                                                                    11 SessionInfo

                                                                                    This document was prepared using the following settings

                                                                                    gt sessionInfo()

                                                                                    R Under development (unstable) (2017-03-05 r72309)

                                                                                    Platform x86_64-apple-darwin1340 (64-bit)

                                                                                    Running under macOS Sierra 10124

                                                                                    Matrix products default

                                                                                    BLAS LibraryFrameworksRframeworkVersions34ResourcesliblibRblas0dylib

                                                                                    LAPACK LibraryFrameworksRframeworkVersions34ResourcesliblibRlapackdylib

                                                                                    locale

                                                                                    [1] C

                                                                                    attached base packages

                                                                                    [1] stats graphics grDevices utils datasets methods base

                                                                                    other attached packages

                                                                                    [1] psych_17421

                                                                                    loaded via a namespace (and not attached)

                                                                                    [1] compiler_340 parallel_340 tools_340 foreign_08-67

                                                                                    [5] KernSmooth_223-15 nlme_31-131 mnormt_15-4 grid_340

                                                                                    [9] lattice_020-34

                                                                                    52

                                                                                    References

                                                                                    Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

                                                                                    Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

                                                                                    Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

                                                                                    Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

                                                                                    Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

                                                                                    Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

                                                                                    Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

                                                                                    Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

                                                                                    Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

                                                                                    Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

                                                                                    Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

                                                                                    Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

                                                                                    Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

                                                                                    Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

                                                                                    Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

                                                                                    53

                                                                                    Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

                                                                                    Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

                                                                                    Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

                                                                                    Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

                                                                                    Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

                                                                                    Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

                                                                                    Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

                                                                                    Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

                                                                                    Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

                                                                                    Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

                                                                                    MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

                                                                                    Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

                                                                                    McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

                                                                                    Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

                                                                                    Nunnally J C (1967) Psychometric theory McGraw-Hill New York

                                                                                    54

                                                                                    Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

                                                                                    3rd edition

                                                                                    Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

                                                                                    Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

                                                                                    Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

                                                                                    Revelle W (2015) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 158

                                                                                    Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

                                                                                    Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

                                                                                    Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

                                                                                    Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

                                                                                    Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook ofIndividual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

                                                                                    Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

                                                                                    Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

                                                                                    Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

                                                                                    Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bang

                                                                                    55

                                                                                    for the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

                                                                                    Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

                                                                                    Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

                                                                                    Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

                                                                                    Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

                                                                                    Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

                                                                                    Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

                                                                                    Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

                                                                                    Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

                                                                                    Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

                                                                                    Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

                                                                                    Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

                                                                                    56

                                                                                    Index

                                                                                    affect 14 24alpha 5 6

                                                                                    Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26bifactor 6biserial 13 34blockrandom 49burt 34

                                                                                    char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49circular statistics 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49ctv 7cubits 50

                                                                                    densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49diagram 8drawcor 34drawtetra 34dummycode 13

                                                                                    dynamite plot 19

                                                                                    edit 3epibfi 50error bars 19errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23

                                                                                    fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factor analysis 6factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49

                                                                                    galton 51generalized least squares 6geometricmean 49GPArotation 7guttman 6

                                                                                    harmonicmean 49head 49headtail 49heights 50hetdiagram 7Hmisc 28Holzinger 50

                                                                                    57

                                                                                    ICC 6 49iclust 6iclustdiagram 7Index 49introduction to psychometric theory with ap-

                                                                                    plications in R 7iqitems 50irtfa 6 47irt2latex 47 49

                                                                                    KnitR 47

                                                                                    lavaan 38library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27lowess 14

                                                                                    makekeys 14MAP 6mardia 49maximum likelihood 6mediate 4 41 42mediatediagram 41minimum residual 6mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevel 37multilevelreliability 6multiple regression 38

                                                                                    nfactors 6nlme 37

                                                                                    omega 6 7outlier 3 11 12

                                                                                    padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7principal axis 6psych 3 5ndash8 28 47 49ndash52

                                                                                    R functionaffect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50

                                                                                    58

                                                                                    densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13edit 3epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49head 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47

                                                                                    irt2latex 47 49library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7psych 51psych package

                                                                                    affect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26

                                                                                    59

                                                                                    biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49corsmooth 34cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7

                                                                                    fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47irt2latex 47 49lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12prep 49pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34

                                                                                    60

                                                                                    polyserial 34principal 5ndash7psych 51rtest 28rangeCorrection 49readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

                                                                                    rtest 28

                                                                                    rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50table 19tail 49tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

                                                                                    R package

                                                                                    61

                                                                                    ctv 7GPArotation 7Hmisc 28KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash8 28 47 49ndash52Rgraphviz 8sem 7 50stats 28Sweave 47xtable 47

                                                                                    rtest 28rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49Rgraphviz 8

                                                                                    SAPA 26 50 51satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43sem 7 50set correlation 44setcor 49setCor 4 38 41 44 46 47simmultilevel 37

                                                                                    spider 13stars 14stats 28StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50Sweave 47

                                                                                    table 19tail 49tetrachoric 6 34Thurstone 28 38 50Thurstone33 50topBottom 49

                                                                                    vegetables 50 51violinBy 14 18vss 5 6

                                                                                    weighted least squares 6withinBetween 37

                                                                                    xtable 47

                                                                                    62

                                                                                    • Jump starting the psych packagendasha guide for the impatient
                                                                                    • Psychometric functions are summarized in the second vignette
                                                                                    • Overview of this and related documents
                                                                                    • Getting started
                                                                                    • Basic data analysis
                                                                                      • Getting the data by using readfile
                                                                                      • Data input from the clipboard
                                                                                      • Basic descriptive statistics
                                                                                        • Outlier detection using outlier
                                                                                        • Basic data cleaning using scrub
                                                                                        • Recoding categorical variables into dummy coded variables
                                                                                          • Simple descriptive graphics
                                                                                            • Scatter Plot Matrices
                                                                                            • Density or violin plots
                                                                                            • Means and error bars
                                                                                            • Error bars for tabular data
                                                                                            • Two dimensional displays of means and errors
                                                                                            • Back to back histograms
                                                                                            • Correlational structure
                                                                                            • Heatmap displays of correlational structure
                                                                                              • Testing correlations
                                                                                              • Polychoric tetrachoric polyserial and biserial correlations
                                                                                                • Multilevel modeling
                                                                                                  • Decomposing data into within and between level correlations using statsBy
                                                                                                  • Generating and displaying multilevel data
                                                                                                  • Factor analysis by groups
                                                                                                    • Multiple Regression mediation moderation and set correlations
                                                                                                      • Multiple regression from data or correlation matrices
                                                                                                      • Mediation and Moderation analysis
                                                                                                      • Set Correlation
                                                                                                        • Converting output to APA style tables using LaTeX
                                                                                                        • Miscellaneous functions
                                                                                                        • Data sets
                                                                                                        • Development version and a users guide
                                                                                                        • Psychometric Theory
                                                                                                        • SessionInfo

                                                                                      gt preacher lt- setCor(1c(23)sobelstd=FALSE)

                                                                                      gt setCordiagram(preacher)

                                                                                      Regression Models

                                                                                      THERAPY

                                                                                      ATTRIB

                                                                                      SATIS

                                                                                      043

                                                                                      04

                                                                                      021

                                                                                      Figure 17 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure

                                                                                      43

                                                                                      for speed The default number of boot straps is 5000

                                                                                      53 Set Correlation

                                                                                      An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor

                                                                                      function Set correlation is

                                                                                      R2 = 1minusn

                                                                                      prodi=1

                                                                                      (1minusλi)

                                                                                      where λi is the ith eigen value of the eigen value decomposition of the matrix

                                                                                      R = Rminus1xx RxyRminus1

                                                                                      xx Rminus1xy

                                                                                      Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate

                                                                                      setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data

                                                                                      Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things tonotice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied

                                                                                      For this example the analysis is done on the correlation matrix rather than the rawdata

                                                                                      gt C lt- cov(satactuse=pairwise)

                                                                                      gt model1 lt- lm(ACT~ gender + education + age data=satact)

                                                                                      gt summary(model1)

                                                                                      Call

                                                                                      lm(formula = ACT ~ gender + education + age data = satact)

                                                                                      Residuals

                                                                                      44

                                                                                      Call mediate(y = c(SATQ) x = c(ACT) m = education data = satact

                                                                                      mod = gender niter = 50 std = TRUE)

                                                                                      The DV (Y) was SATQ The IV (X) was ACT gender ACTXgndr The mediating variable(s) = education

                                                                                      Total Direct effect(c) of ACT on SATQ = 058 SE = 003 t direct = 1925 with probability = 0

                                                                                      Direct effect (c) of ACT on SATQ removing education = 059 SE = 003 t direct = 1926 with probability = 0

                                                                                      Indirect effect (ab) of ACT on SATQ through education = -001

                                                                                      Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -002 Upper CI = 0

                                                                                      Total Direct effect(c) of gender on SATQ = -014 SE = 003 t direct = -478 with probability = 21e-06

                                                                                      Direct effect (c) of gender on NA removing education = -014 SE = 003 t direct = -463 with probability = 44e-06

                                                                                      Indirect effect (ab) of gender on SATQ through education = 0

                                                                                      Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -001 Upper CI = 0

                                                                                      Total Direct effect(c) of ACTXgndr on SATQ = 0 SE = 003 t direct = 002 with probability = 099

                                                                                      Direct effect (c) of ACTXgndr on NA removing education = 0 SE = 003 t direct = 001 with probability = 099

                                                                                      Indirect effect (ab) of ACTXgndr on SATQ through education = 0

                                                                                      Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = 0 Upper CI = 0

                                                                                      R2 of model = 037

                                                                                      To see the longer output specify short = FALSE in the print statement

                                                                                      Full output

                                                                                      Total effect estimates (c)

                                                                                      SATQ se t Prob

                                                                                      ACT 058 003 1925 000e+00

                                                                                      gender -014 003 -478 210e-06

                                                                                      ACTXgndr 000 003 002 985e-01

                                                                                      Direct effect estimates (c)SATQ se t Prob

                                                                                      ACT 059 003 1926 000e+00

                                                                                      gender -014 003 -463 437e-06

                                                                                      ACTXgndr 000 003 001 992e-01

                                                                                      a effect estimates

                                                                                      education se t Prob

                                                                                      ACT 016 004 422 277e-05

                                                                                      gender 009 004 250 128e-02

                                                                                      ACTXgndr -001 004 -015 883e-01

                                                                                      b effect estimates

                                                                                      SATQ se t Prob

                                                                                      education -004 003 -145 0147

                                                                                      ab effect estimates

                                                                                      SATQ boot sd lower upper

                                                                                      ACT -001 -001 001 0 0

                                                                                      gender 000 000 000 0 0

                                                                                      ACTXgndr 000 000 000 0 0

                                                                                      Moderation model

                                                                                      ACT

                                                                                      gender

                                                                                      ACTXgndr

                                                                                      SATQ

                                                                                      education016 c = 058

                                                                                      c = 059

                                                                                      009 c = minus014

                                                                                      c = minus014

                                                                                      minus001 c = 0

                                                                                      c = 0

                                                                                      minus004

                                                                                      minus004

                                                                                      minus007

                                                                                      002

                                                                                      Figure 18 Moderated multiple regression requires the raw data

                                                                                      45

                                                                                      Min 1Q Median 3Q Max

                                                                                      -252458 -32133 07769 35921 92630

                                                                                      Coefficients

                                                                                      Estimate Std Error t value Pr(gt|t|)

                                                                                      (Intercept) 2741706 082140 33378 lt 2e-16

                                                                                      gender -048606 037984 -1280 020110

                                                                                      education 047890 015235 3143 000174

                                                                                      age 001623 002278 0712 047650

                                                                                      ---

                                                                                      Signif codes 0 0001 001 005 01 1

                                                                                      Residual standard error 4768 on 696 degrees of freedom

                                                                                      Multiple R-squared 00272 Adjusted R-squared 002301

                                                                                      F-statistic 6487 on 3 and 696 DF p-value 00002476

                                                                                      Compare this with the output from setCor

                                                                                      gt compare with sector

                                                                                      gt setCor(c(46)c(13)C nobs=700)

                                                                                      Call setCor(y = c(46) x = c(13) data = C nobs = 700)

                                                                                      Multiple Regression from matrix input

                                                                                      Beta weights

                                                                                      ACT SATV SATQ

                                                                                      gender -005 -003 -018

                                                                                      education 014 010 010

                                                                                      age 003 -010 -009

                                                                                      Multiple R

                                                                                      ACT SATV SATQ

                                                                                      016 010 019

                                                                                      multiple R2

                                                                                      ACT SATV SATQ

                                                                                      00272 00096 00359

                                                                                      Multiple Inflation Factor (VIF) = 1(1-SMC) =

                                                                                      gender education age

                                                                                      101 145 144

                                                                                      Unweighted multiple R

                                                                                      ACT SATV SATQ

                                                                                      015 005 011

                                                                                      Unweighted multiple R2

                                                                                      ACT SATV SATQ

                                                                                      002 000 001

                                                                                      SE of Beta weights

                                                                                      ACT SATV SATQ

                                                                                      gender 018 429 434

                                                                                      education 022 513 518

                                                                                      age 022 511 516

                                                                                      t of Beta Weights

                                                                                      ACT SATV SATQ

                                                                                      gender -027 -001 -004

                                                                                      education 065 002 002

                                                                                      46

                                                                                      age 015 -002 -002

                                                                                      Probability of t lt

                                                                                      ACT SATV SATQ

                                                                                      gender 079 099 097

                                                                                      education 051 098 098

                                                                                      age 088 098 099

                                                                                      Shrunken R2

                                                                                      ACT SATV SATQ

                                                                                      00230 00054 00317

                                                                                      Standard Error of R2

                                                                                      ACT SATV SATQ

                                                                                      00120 00073 00137

                                                                                      F

                                                                                      ACT SATV SATQ

                                                                                      649 226 863

                                                                                      Probability of F lt

                                                                                      ACT SATV SATQ

                                                                                      248e-04 808e-02 124e-05

                                                                                      degrees of freedom of regression

                                                                                      [1] 3 696

                                                                                      Various estimates of between set correlations

                                                                                      Squared Canonical Correlations

                                                                                      [1] 0050 0033 0008

                                                                                      Chisq of canonical correlations

                                                                                      [1] 358 231 56

                                                                                      Average squared canonical correlation = 003

                                                                                      Cohens Set Correlation R2 = 009

                                                                                      Shrunken Set Correlation R2 = 008

                                                                                      F and df of Cohens Set Correlation 726 9 168186

                                                                                      Unweighted correlation between the two sets = 001

                                                                                      Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

                                                                                      6 Converting output to APA style tables using LATEX

                                                                                      Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenient

                                                                                      47

                                                                                      LATEXoutput and finally df2latex converts a generic data frame to LATEX

                                                                                      An example of converting the output from fa to LATEXappears in Table 2

                                                                                      Table 2 fa2latexA factor analysis table from the psych package in R

                                                                                      Variable MR1 MR2 MR3 h2 u2 com

                                                                                      Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

                                                                                      SS loadings 264 186 15

                                                                                      MR1 100 059 054MR2 059 100 052MR3 054 052 100

                                                                                      48

                                                                                      7 Miscellaneous functions

                                                                                      A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

                                                                                      blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

                                                                                      df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

                                                                                      scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

                                                                                      cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

                                                                                      cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

                                                                                      dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

                                                                                      fisherz Convert a correlation to the corresponding Fisher z score

                                                                                      geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

                                                                                      ICC and cohenkappa are typically used to find the reliability for raters

                                                                                      headtail combines the head and tail functions to show the first and last lines of a dataset or output

                                                                                      topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

                                                                                      mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

                                                                                      prep finds the probability of replication for an F t or r and estimate effect size

                                                                                      partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

                                                                                      rangeCorrection will correct correlations for restriction of range

                                                                                      reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

                                                                                      49

                                                                                      superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

                                                                                      8 Data sets

                                                                                      A number of data sets for demonstrating psychometric techniques are included in thepsych package These include six data sets showing a hierarchical factor structure (fivecognitive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2and one from health psychology Reise) One of these (Thurstone) is used as an examplein the sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) or 13 personality inventory scores (epibfi) and14 multiple choice iq items (iqitems) The vegetables example has paired comparisonpreferences for 9 vegetables This is an example of Thurstonian scaling used by Guilford(1954) and Nunnally (1967) Other data sets include cubits peas and heights fromGalton

                                                                                      Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

                                                                                      bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

                                                                                      satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

                                                                                      epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

                                                                                      50

                                                                                      iq 14 multiple choice ability items were included as part of the Synthetic Aperture Person-ality Assessment (SAPA) web based personality assessment project The data from1000 subjects are included here as a demonstration set for scoring multiple choiceinventories and doing basic item statistics

                                                                                      galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to ntroduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

                                                                                      Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

                                                                                      miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

                                                                                      9 Development version and a users guide

                                                                                      The most recent development version is available as a source file at the repository main-tained at httppersonality-projectorgr That version will have removed the mostrecently discovered bugs (but perhaps introduced other yet to be discovered ones) Todownload that version go to the repository httppersonality-projectorgrsrc

                                                                                      contrib and wander around For a Mac this version can be installed directly using theldquoother repositoryrdquo option in the package installer For a PC the zip file for the most recentrelease has been created using the win-builder facility at CRAN The development releasefor the Mac is usually several weeks ahead of the PC development version

                                                                                      Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httppersonality-projectorgrpsych_manualpdf

                                                                                      News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

                                                                                      gt news(Version gt 170package=psych)

                                                                                      51

                                                                                      10 Psychometric Theory

                                                                                      The psych package has been developed to help psychologists do basic research Many ofthe functions were developed to supplement a book (httppersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep)More information about the use of some of the functions may be found in the book

                                                                                      For more extensive discussion of the use of psych in particular and R in general consulthttppersonality-projectorgrrguidehtml A short guide to R

                                                                                      11 SessionInfo

                                                                                      This document was prepared using the following settings

                                                                                      gt sessionInfo()

                                                                                      R Under development (unstable) (2017-03-05 r72309)

                                                                                      Platform x86_64-apple-darwin1340 (64-bit)

                                                                                      Running under macOS Sierra 10124

                                                                                      Matrix products default

                                                                                      BLAS LibraryFrameworksRframeworkVersions34ResourcesliblibRblas0dylib

                                                                                      LAPACK LibraryFrameworksRframeworkVersions34ResourcesliblibRlapackdylib

                                                                                      locale

                                                                                      [1] C

                                                                                      attached base packages

                                                                                      [1] stats graphics grDevices utils datasets methods base

                                                                                      other attached packages

                                                                                      [1] psych_17421

                                                                                      loaded via a namespace (and not attached)

                                                                                      [1] compiler_340 parallel_340 tools_340 foreign_08-67

                                                                                      [5] KernSmooth_223-15 nlme_31-131 mnormt_15-4 grid_340

                                                                                      [9] lattice_020-34

                                                                                      52

                                                                                      References

                                                                                      Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

                                                                                      Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

                                                                                      Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

                                                                                      Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

                                                                                      Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

                                                                                      Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

                                                                                      Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

                                                                                      Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

                                                                                      Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

                                                                                      Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

                                                                                      Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

                                                                                      Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

                                                                                      Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

                                                                                      Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

                                                                                      Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

                                                                                      53

                                                                                      Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

                                                                                      Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

                                                                                      Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

                                                                                      Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

                                                                                      Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

                                                                                      Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

                                                                                      Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

                                                                                      Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

                                                                                      Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

                                                                                      Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

                                                                                      MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

                                                                                      Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

                                                                                      McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

                                                                                      Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

                                                                                      Nunnally J C (1967) Psychometric theory McGraw-Hill New York

                                                                                      54

                                                                                      Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

                                                                                      3rd edition

                                                                                      Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

                                                                                      Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

                                                                                      Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

                                                                                      Revelle W (2015) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 158

                                                                                      Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

                                                                                      Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

                                                                                      Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

                                                                                      Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

                                                                                      Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook ofIndividual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

                                                                                      Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

                                                                                      Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

                                                                                      Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

                                                                                      Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bang

                                                                                      55

                                                                                      for the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

                                                                                      Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

                                                                                      Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

                                                                                      Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

                                                                                      Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

                                                                                      Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

                                                                                      Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

                                                                                      Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

                                                                                      Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

                                                                                      Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

                                                                                      Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

                                                                                      Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

                                                                                      56

                                                                                      Index

                                                                                      affect 14 24alpha 5 6

                                                                                      Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26bifactor 6biserial 13 34blockrandom 49burt 34

                                                                                      char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49circular statistics 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49ctv 7cubits 50

                                                                                      densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49diagram 8drawcor 34drawtetra 34dummycode 13

                                                                                      dynamite plot 19

                                                                                      edit 3epibfi 50error bars 19errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23

                                                                                      fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factor analysis 6factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49

                                                                                      galton 51generalized least squares 6geometricmean 49GPArotation 7guttman 6

                                                                                      harmonicmean 49head 49headtail 49heights 50hetdiagram 7Hmisc 28Holzinger 50

                                                                                      57

                                                                                      ICC 6 49iclust 6iclustdiagram 7Index 49introduction to psychometric theory with ap-

                                                                                      plications in R 7iqitems 50irtfa 6 47irt2latex 47 49

                                                                                      KnitR 47

                                                                                      lavaan 38library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27lowess 14

                                                                                      makekeys 14MAP 6mardia 49maximum likelihood 6mediate 4 41 42mediatediagram 41minimum residual 6mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevel 37multilevelreliability 6multiple regression 38

                                                                                      nfactors 6nlme 37

                                                                                      omega 6 7outlier 3 11 12

                                                                                      padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7principal axis 6psych 3 5ndash8 28 47 49ndash52

                                                                                      R functionaffect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50

                                                                                      58

                                                                                      densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13edit 3epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49head 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47

                                                                                      irt2latex 47 49library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7psych 51psych package

                                                                                      affect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26

                                                                                      59

                                                                                      biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49corsmooth 34cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7

                                                                                      fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47irt2latex 47 49lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12prep 49pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34

                                                                                      60

                                                                                      polyserial 34principal 5ndash7psych 51rtest 28rangeCorrection 49readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

                                                                                      rtest 28

                                                                                      rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50table 19tail 49tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

                                                                                      R package

                                                                                      61

                                                                                      ctv 7GPArotation 7Hmisc 28KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash8 28 47 49ndash52Rgraphviz 8sem 7 50stats 28Sweave 47xtable 47

                                                                                      rtest 28rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49Rgraphviz 8

                                                                                      SAPA 26 50 51satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43sem 7 50set correlation 44setcor 49setCor 4 38 41 44 46 47simmultilevel 37

                                                                                      spider 13stars 14stats 28StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50Sweave 47

                                                                                      table 19tail 49tetrachoric 6 34Thurstone 28 38 50Thurstone33 50topBottom 49

                                                                                      vegetables 50 51violinBy 14 18vss 5 6

                                                                                      weighted least squares 6withinBetween 37

                                                                                      xtable 47

                                                                                      62

                                                                                      • Jump starting the psych packagendasha guide for the impatient
                                                                                      • Psychometric functions are summarized in the second vignette
                                                                                      • Overview of this and related documents
                                                                                      • Getting started
                                                                                      • Basic data analysis
                                                                                        • Getting the data by using readfile
                                                                                        • Data input from the clipboard
                                                                                        • Basic descriptive statistics
                                                                                          • Outlier detection using outlier
                                                                                          • Basic data cleaning using scrub
                                                                                          • Recoding categorical variables into dummy coded variables
                                                                                            • Simple descriptive graphics
                                                                                              • Scatter Plot Matrices
                                                                                              • Density or violin plots
                                                                                              • Means and error bars
                                                                                              • Error bars for tabular data
                                                                                              • Two dimensional displays of means and errors
                                                                                              • Back to back histograms
                                                                                              • Correlational structure
                                                                                              • Heatmap displays of correlational structure
                                                                                                • Testing correlations
                                                                                                • Polychoric tetrachoric polyserial and biserial correlations
                                                                                                  • Multilevel modeling
                                                                                                    • Decomposing data into within and between level correlations using statsBy
                                                                                                    • Generating and displaying multilevel data
                                                                                                    • Factor analysis by groups
                                                                                                      • Multiple Regression mediation moderation and set correlations
                                                                                                        • Multiple regression from data or correlation matrices
                                                                                                        • Mediation and Moderation analysis
                                                                                                        • Set Correlation
                                                                                                          • Converting output to APA style tables using LaTeX
                                                                                                          • Miscellaneous functions
                                                                                                          • Data sets
                                                                                                          • Development version and a users guide
                                                                                                          • Psychometric Theory
                                                                                                          • SessionInfo

                                                                                        for speed The default number of boot straps is 5000

                                                                                        53 Set Correlation

                                                                                        An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor

                                                                                        function Set correlation is

                                                                                        R2 = 1minusn

                                                                                        prodi=1

                                                                                        (1minusλi)

                                                                                        where λi is the ith eigen value of the eigen value decomposition of the matrix

                                                                                        R = Rminus1xx RxyRminus1

                                                                                        xx Rminus1xy

                                                                                        Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate

                                                                                        setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data

                                                                                        Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things tonotice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied

                                                                                        For this example the analysis is done on the correlation matrix rather than the rawdata

                                                                                        gt C lt- cov(satactuse=pairwise)

                                                                                        gt model1 lt- lm(ACT~ gender + education + age data=satact)

                                                                                        gt summary(model1)

                                                                                        Call

                                                                                        lm(formula = ACT ~ gender + education + age data = satact)

                                                                                        Residuals

                                                                                        44

                                                                                        Call mediate(y = c(SATQ) x = c(ACT) m = education data = satact

                                                                                        mod = gender niter = 50 std = TRUE)

                                                                                        The DV (Y) was SATQ The IV (X) was ACT gender ACTXgndr The mediating variable(s) = education

                                                                                        Total Direct effect(c) of ACT on SATQ = 058 SE = 003 t direct = 1925 with probability = 0

                                                                                        Direct effect (c) of ACT on SATQ removing education = 059 SE = 003 t direct = 1926 with probability = 0

                                                                                        Indirect effect (ab) of ACT on SATQ through education = -001

                                                                                        Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -002 Upper CI = 0

                                                                                        Total Direct effect(c) of gender on SATQ = -014 SE = 003 t direct = -478 with probability = 21e-06

                                                                                        Direct effect (c) of gender on NA removing education = -014 SE = 003 t direct = -463 with probability = 44e-06

                                                                                        Indirect effect (ab) of gender on SATQ through education = 0

                                                                                        Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -001 Upper CI = 0

                                                                                        Total Direct effect(c) of ACTXgndr on SATQ = 0 SE = 003 t direct = 002 with probability = 099

                                                                                        Direct effect (c) of ACTXgndr on NA removing education = 0 SE = 003 t direct = 001 with probability = 099

                                                                                        Indirect effect (ab) of ACTXgndr on SATQ through education = 0

                                                                                        Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = 0 Upper CI = 0

                                                                                        R2 of model = 037

                                                                                        To see the longer output specify short = FALSE in the print statement

                                                                                        Full output

                                                                                        Total effect estimates (c)

                                                                                        SATQ se t Prob

                                                                                        ACT 058 003 1925 000e+00

                                                                                        gender -014 003 -478 210e-06

                                                                                        ACTXgndr 000 003 002 985e-01

                                                                                        Direct effect estimates (c)SATQ se t Prob

                                                                                        ACT 059 003 1926 000e+00

                                                                                        gender -014 003 -463 437e-06

                                                                                        ACTXgndr 000 003 001 992e-01

                                                                                        a effect estimates

                                                                                        education se t Prob

                                                                                        ACT 016 004 422 277e-05

                                                                                        gender 009 004 250 128e-02

                                                                                        ACTXgndr -001 004 -015 883e-01

                                                                                        b effect estimates

                                                                                        SATQ se t Prob

                                                                                        education -004 003 -145 0147

                                                                                        ab effect estimates

                                                                                        SATQ boot sd lower upper

                                                                                        ACT -001 -001 001 0 0

                                                                                        gender 000 000 000 0 0

                                                                                        ACTXgndr 000 000 000 0 0

                                                                                        Moderation model

                                                                                        ACT

                                                                                        gender

                                                                                        ACTXgndr

                                                                                        SATQ

                                                                                        education016 c = 058

                                                                                        c = 059

                                                                                        009 c = minus014

                                                                                        c = minus014

                                                                                        minus001 c = 0

                                                                                        c = 0

                                                                                        minus004

                                                                                        minus004

                                                                                        minus007

                                                                                        002

                                                                                        Figure 18 Moderated multiple regression requires the raw data

                                                                                        45

                                                                                        Min 1Q Median 3Q Max

                                                                                        -252458 -32133 07769 35921 92630

                                                                                        Coefficients

                                                                                        Estimate Std Error t value Pr(gt|t|)

                                                                                        (Intercept) 2741706 082140 33378 lt 2e-16

                                                                                        gender -048606 037984 -1280 020110

                                                                                        education 047890 015235 3143 000174

                                                                                        age 001623 002278 0712 047650

                                                                                        ---

                                                                                        Signif codes 0 0001 001 005 01 1

                                                                                        Residual standard error 4768 on 696 degrees of freedom

                                                                                        Multiple R-squared 00272 Adjusted R-squared 002301

                                                                                        F-statistic 6487 on 3 and 696 DF p-value 00002476

                                                                                        Compare this with the output from setCor

                                                                                        gt compare with sector

                                                                                        gt setCor(c(46)c(13)C nobs=700)

                                                                                        Call setCor(y = c(46) x = c(13) data = C nobs = 700)

                                                                                        Multiple Regression from matrix input

                                                                                        Beta weights

                                                                                        ACT SATV SATQ

                                                                                        gender -005 -003 -018

                                                                                        education 014 010 010

                                                                                        age 003 -010 -009

                                                                                        Multiple R

                                                                                        ACT SATV SATQ

                                                                                        016 010 019

                                                                                        multiple R2

                                                                                        ACT SATV SATQ

                                                                                        00272 00096 00359

                                                                                        Multiple Inflation Factor (VIF) = 1(1-SMC) =

                                                                                        gender education age

                                                                                        101 145 144

                                                                                        Unweighted multiple R

                                                                                        ACT SATV SATQ

                                                                                        015 005 011

                                                                                        Unweighted multiple R2

                                                                                        ACT SATV SATQ

                                                                                        002 000 001

                                                                                        SE of Beta weights

                                                                                        ACT SATV SATQ

                                                                                        gender 018 429 434

                                                                                        education 022 513 518

                                                                                        age 022 511 516

                                                                                        t of Beta Weights

                                                                                        ACT SATV SATQ

                                                                                        gender -027 -001 -004

                                                                                        education 065 002 002

                                                                                        46

                                                                                        age 015 -002 -002

                                                                                        Probability of t lt

                                                                                        ACT SATV SATQ

                                                                                        gender 079 099 097

                                                                                        education 051 098 098

                                                                                        age 088 098 099

                                                                                        Shrunken R2

                                                                                        ACT SATV SATQ

                                                                                        00230 00054 00317

                                                                                        Standard Error of R2

                                                                                        ACT SATV SATQ

                                                                                        00120 00073 00137

                                                                                        F

                                                                                        ACT SATV SATQ

                                                                                        649 226 863

                                                                                        Probability of F lt

                                                                                        ACT SATV SATQ

                                                                                        248e-04 808e-02 124e-05

                                                                                        degrees of freedom of regression

                                                                                        [1] 3 696

                                                                                        Various estimates of between set correlations

                                                                                        Squared Canonical Correlations

                                                                                        [1] 0050 0033 0008

                                                                                        Chisq of canonical correlations

                                                                                        [1] 358 231 56

                                                                                        Average squared canonical correlation = 003

                                                                                        Cohens Set Correlation R2 = 009

                                                                                        Shrunken Set Correlation R2 = 008

                                                                                        F and df of Cohens Set Correlation 726 9 168186

                                                                                        Unweighted correlation between the two sets = 001

                                                                                        Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

                                                                                        6 Converting output to APA style tables using LATEX

                                                                                        Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenient

                                                                                        47

                                                                                        LATEXoutput and finally df2latex converts a generic data frame to LATEX

                                                                                        An example of converting the output from fa to LATEXappears in Table 2

                                                                                        Table 2 fa2latexA factor analysis table from the psych package in R

                                                                                        Variable MR1 MR2 MR3 h2 u2 com

                                                                                        Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

                                                                                        SS loadings 264 186 15

                                                                                        MR1 100 059 054MR2 059 100 052MR3 054 052 100

                                                                                        48

                                                                                        7 Miscellaneous functions

                                                                                        A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

                                                                                        blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

                                                                                        df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

                                                                                        scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

                                                                                        cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

                                                                                        cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

                                                                                        dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

                                                                                        fisherz Convert a correlation to the corresponding Fisher z score

                                                                                        geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

                                                                                        ICC and cohenkappa are typically used to find the reliability for raters

                                                                                        headtail combines the head and tail functions to show the first and last lines of a dataset or output

                                                                                        topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

                                                                                        mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

                                                                                        prep finds the probability of replication for an F t or r and estimate effect size

                                                                                        partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

                                                                                        rangeCorrection will correct correlations for restriction of range

                                                                                        reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

                                                                                        49

                                                                                        superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

                                                                                        8 Data sets

                                                                                        A number of data sets for demonstrating psychometric techniques are included in thepsych package These include six data sets showing a hierarchical factor structure (fivecognitive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2and one from health psychology Reise) One of these (Thurstone) is used as an examplein the sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) or 13 personality inventory scores (epibfi) and14 multiple choice iq items (iqitems) The vegetables example has paired comparisonpreferences for 9 vegetables This is an example of Thurstonian scaling used by Guilford(1954) and Nunnally (1967) Other data sets include cubits peas and heights fromGalton

                                                                                        Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

                                                                                        bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

                                                                                        satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

                                                                                        epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

                                                                                        50

                                                                                        iq 14 multiple choice ability items were included as part of the Synthetic Aperture Person-ality Assessment (SAPA) web based personality assessment project The data from1000 subjects are included here as a demonstration set for scoring multiple choiceinventories and doing basic item statistics

                                                                                        galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to ntroduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

                                                                                        Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

                                                                                        miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

                                                                                        9 Development version and a users guide

                                                                                        The most recent development version is available as a source file at the repository main-tained at httppersonality-projectorgr That version will have removed the mostrecently discovered bugs (but perhaps introduced other yet to be discovered ones) Todownload that version go to the repository httppersonality-projectorgrsrc

                                                                                        contrib and wander around For a Mac this version can be installed directly using theldquoother repositoryrdquo option in the package installer For a PC the zip file for the most recentrelease has been created using the win-builder facility at CRAN The development releasefor the Mac is usually several weeks ahead of the PC development version

                                                                                        Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httppersonality-projectorgrpsych_manualpdf

                                                                                        News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

                                                                                        gt news(Version gt 170package=psych)

                                                                                        51

                                                                                        10 Psychometric Theory

                                                                                        The psych package has been developed to help psychologists do basic research Many ofthe functions were developed to supplement a book (httppersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep)More information about the use of some of the functions may be found in the book

                                                                                        For more extensive discussion of the use of psych in particular and R in general consulthttppersonality-projectorgrrguidehtml A short guide to R

                                                                                        11 SessionInfo

                                                                                        This document was prepared using the following settings

                                                                                        gt sessionInfo()

                                                                                        R Under development (unstable) (2017-03-05 r72309)

                                                                                        Platform x86_64-apple-darwin1340 (64-bit)

                                                                                        Running under macOS Sierra 10124

                                                                                        Matrix products default

                                                                                        BLAS LibraryFrameworksRframeworkVersions34ResourcesliblibRblas0dylib

                                                                                        LAPACK LibraryFrameworksRframeworkVersions34ResourcesliblibRlapackdylib

                                                                                        locale

                                                                                        [1] C

                                                                                        attached base packages

                                                                                        [1] stats graphics grDevices utils datasets methods base

                                                                                        other attached packages

                                                                                        [1] psych_17421

                                                                                        loaded via a namespace (and not attached)

                                                                                        [1] compiler_340 parallel_340 tools_340 foreign_08-67

                                                                                        [5] KernSmooth_223-15 nlme_31-131 mnormt_15-4 grid_340

                                                                                        [9] lattice_020-34

                                                                                        52

                                                                                        References

                                                                                        Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

                                                                                        Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

                                                                                        Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

                                                                                        Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

                                                                                        Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

                                                                                        Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

                                                                                        Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

                                                                                        Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

                                                                                        Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

                                                                                        Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

                                                                                        Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

                                                                                        Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

                                                                                        Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

                                                                                        Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

                                                                                        Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

                                                                                        53

                                                                                        Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

                                                                                        Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

                                                                                        Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

                                                                                        Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

                                                                                        Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

                                                                                        Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

                                                                                        Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

                                                                                        Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

                                                                                        Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

                                                                                        Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

                                                                                        MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

                                                                                        Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

                                                                                        McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

                                                                                        Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

                                                                                        Nunnally J C (1967) Psychometric theory McGraw-Hill New York

                                                                                        54

                                                                                        Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

                                                                                        3rd edition

                                                                                        Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

                                                                                        Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

                                                                                        Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

                                                                                        Revelle W (2015) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 158

                                                                                        Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

                                                                                        Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

                                                                                        Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

                                                                                        Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

                                                                                        Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook ofIndividual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

                                                                                        Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

                                                                                        Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

                                                                                        Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

                                                                                        Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bang

                                                                                        55

                                                                                        for the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

                                                                                        Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

                                                                                        Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

                                                                                        Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

                                                                                        Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

                                                                                        Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

                                                                                        Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

                                                                                        Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

                                                                                        Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

                                                                                        Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

                                                                                        Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

                                                                                        Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

                                                                                        56

                                                                                        Index

                                                                                        affect 14 24alpha 5 6

                                                                                        Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26bifactor 6biserial 13 34blockrandom 49burt 34

                                                                                        char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49circular statistics 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49ctv 7cubits 50

                                                                                        densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49diagram 8drawcor 34drawtetra 34dummycode 13

                                                                                        dynamite plot 19

                                                                                        edit 3epibfi 50error bars 19errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23

                                                                                        fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factor analysis 6factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49

                                                                                        galton 51generalized least squares 6geometricmean 49GPArotation 7guttman 6

                                                                                        harmonicmean 49head 49headtail 49heights 50hetdiagram 7Hmisc 28Holzinger 50

                                                                                        57

                                                                                        ICC 6 49iclust 6iclustdiagram 7Index 49introduction to psychometric theory with ap-

                                                                                        plications in R 7iqitems 50irtfa 6 47irt2latex 47 49

                                                                                        KnitR 47

                                                                                        lavaan 38library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27lowess 14

                                                                                        makekeys 14MAP 6mardia 49maximum likelihood 6mediate 4 41 42mediatediagram 41minimum residual 6mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevel 37multilevelreliability 6multiple regression 38

                                                                                        nfactors 6nlme 37

                                                                                        omega 6 7outlier 3 11 12

                                                                                        padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7principal axis 6psych 3 5ndash8 28 47 49ndash52

                                                                                        R functionaffect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50

                                                                                        58

                                                                                        densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13edit 3epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49head 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47

                                                                                        irt2latex 47 49library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7psych 51psych package

                                                                                        affect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26

                                                                                        59

                                                                                        biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49corsmooth 34cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7

                                                                                        fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47irt2latex 47 49lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12prep 49pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34

                                                                                        60

                                                                                        polyserial 34principal 5ndash7psych 51rtest 28rangeCorrection 49readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

                                                                                        rtest 28

                                                                                        rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50table 19tail 49tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

                                                                                        R package

                                                                                        61

                                                                                        ctv 7GPArotation 7Hmisc 28KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash8 28 47 49ndash52Rgraphviz 8sem 7 50stats 28Sweave 47xtable 47

                                                                                        rtest 28rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49Rgraphviz 8

                                                                                        SAPA 26 50 51satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43sem 7 50set correlation 44setcor 49setCor 4 38 41 44 46 47simmultilevel 37

                                                                                        spider 13stars 14stats 28StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50Sweave 47

                                                                                        table 19tail 49tetrachoric 6 34Thurstone 28 38 50Thurstone33 50topBottom 49

                                                                                        vegetables 50 51violinBy 14 18vss 5 6

                                                                                        weighted least squares 6withinBetween 37

                                                                                        xtable 47

                                                                                        62

                                                                                        • Jump starting the psych packagendasha guide for the impatient
                                                                                        • Psychometric functions are summarized in the second vignette
                                                                                        • Overview of this and related documents
                                                                                        • Getting started
                                                                                        • Basic data analysis
                                                                                          • Getting the data by using readfile
                                                                                          • Data input from the clipboard
                                                                                          • Basic descriptive statistics
                                                                                            • Outlier detection using outlier
                                                                                            • Basic data cleaning using scrub
                                                                                            • Recoding categorical variables into dummy coded variables
                                                                                              • Simple descriptive graphics
                                                                                                • Scatter Plot Matrices
                                                                                                • Density or violin plots
                                                                                                • Means and error bars
                                                                                                • Error bars for tabular data
                                                                                                • Two dimensional displays of means and errors
                                                                                                • Back to back histograms
                                                                                                • Correlational structure
                                                                                                • Heatmap displays of correlational structure
                                                                                                  • Testing correlations
                                                                                                  • Polychoric tetrachoric polyserial and biserial correlations
                                                                                                    • Multilevel modeling
                                                                                                      • Decomposing data into within and between level correlations using statsBy
                                                                                                      • Generating and displaying multilevel data
                                                                                                      • Factor analysis by groups
                                                                                                        • Multiple Regression mediation moderation and set correlations
                                                                                                          • Multiple regression from data or correlation matrices
                                                                                                          • Mediation and Moderation analysis
                                                                                                          • Set Correlation
                                                                                                            • Converting output to APA style tables using LaTeX
                                                                                                            • Miscellaneous functions
                                                                                                            • Data sets
                                                                                                            • Development version and a users guide
                                                                                                            • Psychometric Theory
                                                                                                            • SessionInfo

                                                                                          Call mediate(y = c(SATQ) x = c(ACT) m = education data = satact

                                                                                          mod = gender niter = 50 std = TRUE)

                                                                                          The DV (Y) was SATQ The IV (X) was ACT gender ACTXgndr The mediating variable(s) = education

                                                                                          Total Direct effect(c) of ACT on SATQ = 058 SE = 003 t direct = 1925 with probability = 0

                                                                                          Direct effect (c) of ACT on SATQ removing education = 059 SE = 003 t direct = 1926 with probability = 0

                                                                                          Indirect effect (ab) of ACT on SATQ through education = -001

                                                                                          Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -002 Upper CI = 0

                                                                                          Total Direct effect(c) of gender on SATQ = -014 SE = 003 t direct = -478 with probability = 21e-06

                                                                                          Direct effect (c) of gender on NA removing education = -014 SE = 003 t direct = -463 with probability = 44e-06

                                                                                          Indirect effect (ab) of gender on SATQ through education = 0

                                                                                          Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -001 Upper CI = 0

                                                                                          Total Direct effect(c) of ACTXgndr on SATQ = 0 SE = 003 t direct = 002 with probability = 099

                                                                                          Direct effect (c) of ACTXgndr on NA removing education = 0 SE = 003 t direct = 001 with probability = 099

                                                                                          Indirect effect (ab) of ACTXgndr on SATQ through education = 0

                                                                                          Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = 0 Upper CI = 0

                                                                                          R2 of model = 037

                                                                                          To see the longer output specify short = FALSE in the print statement

                                                                                          Full output

                                                                                          Total effect estimates (c)

                                                                                          SATQ se t Prob

                                                                                          ACT 058 003 1925 000e+00

                                                                                          gender -014 003 -478 210e-06

                                                                                          ACTXgndr 000 003 002 985e-01

                                                                                          Direct effect estimates (c)SATQ se t Prob

                                                                                          ACT 059 003 1926 000e+00

                                                                                          gender -014 003 -463 437e-06

                                                                                          ACTXgndr 000 003 001 992e-01

                                                                                          a effect estimates

                                                                                          education se t Prob

                                                                                          ACT 016 004 422 277e-05

                                                                                          gender 009 004 250 128e-02

                                                                                          ACTXgndr -001 004 -015 883e-01

                                                                                          b effect estimates

                                                                                          SATQ se t Prob

                                                                                          education -004 003 -145 0147

                                                                                          ab effect estimates

                                                                                          SATQ boot sd lower upper

                                                                                          ACT -001 -001 001 0 0

                                                                                          gender 000 000 000 0 0

                                                                                          ACTXgndr 000 000 000 0 0

                                                                                          Moderation model

                                                                                          ACT

                                                                                          gender

                                                                                          ACTXgndr

                                                                                          SATQ

                                                                                          education016 c = 058

                                                                                          c = 059

                                                                                          009 c = minus014

                                                                                          c = minus014

                                                                                          minus001 c = 0

                                                                                          c = 0

                                                                                          minus004

                                                                                          minus004

                                                                                          minus007

                                                                                          002

                                                                                          Figure 18 Moderated multiple regression requires the raw data

                                                                                          45

                                                                                          Min 1Q Median 3Q Max

                                                                                          -252458 -32133 07769 35921 92630

                                                                                          Coefficients

                                                                                          Estimate Std Error t value Pr(gt|t|)

                                                                                          (Intercept) 2741706 082140 33378 lt 2e-16

                                                                                          gender -048606 037984 -1280 020110

                                                                                          education 047890 015235 3143 000174

                                                                                          age 001623 002278 0712 047650

                                                                                          ---

                                                                                          Signif codes 0 0001 001 005 01 1

                                                                                          Residual standard error 4768 on 696 degrees of freedom

                                                                                          Multiple R-squared 00272 Adjusted R-squared 002301

                                                                                          F-statistic 6487 on 3 and 696 DF p-value 00002476

                                                                                          Compare this with the output from setCor

                                                                                          gt compare with sector

                                                                                          gt setCor(c(46)c(13)C nobs=700)

                                                                                          Call setCor(y = c(46) x = c(13) data = C nobs = 700)

                                                                                          Multiple Regression from matrix input

                                                                                          Beta weights

                                                                                          ACT SATV SATQ

                                                                                          gender -005 -003 -018

                                                                                          education 014 010 010

                                                                                          age 003 -010 -009

                                                                                          Multiple R

                                                                                          ACT SATV SATQ

                                                                                          016 010 019

                                                                                          multiple R2

                                                                                          ACT SATV SATQ

                                                                                          00272 00096 00359

                                                                                          Multiple Inflation Factor (VIF) = 1(1-SMC) =

                                                                                          gender education age

                                                                                          101 145 144

                                                                                          Unweighted multiple R

                                                                                          ACT SATV SATQ

                                                                                          015 005 011

                                                                                          Unweighted multiple R2

                                                                                          ACT SATV SATQ

                                                                                          002 000 001

                                                                                          SE of Beta weights

                                                                                          ACT SATV SATQ

                                                                                          gender 018 429 434

                                                                                          education 022 513 518

                                                                                          age 022 511 516

                                                                                          t of Beta Weights

                                                                                          ACT SATV SATQ

                                                                                          gender -027 -001 -004

                                                                                          education 065 002 002

                                                                                          46

                                                                                          age 015 -002 -002

                                                                                          Probability of t lt

                                                                                          ACT SATV SATQ

                                                                                          gender 079 099 097

                                                                                          education 051 098 098

                                                                                          age 088 098 099

                                                                                          Shrunken R2

                                                                                          ACT SATV SATQ

                                                                                          00230 00054 00317

                                                                                          Standard Error of R2

                                                                                          ACT SATV SATQ

                                                                                          00120 00073 00137

                                                                                          F

                                                                                          ACT SATV SATQ

                                                                                          649 226 863

                                                                                          Probability of F lt

                                                                                          ACT SATV SATQ

                                                                                          248e-04 808e-02 124e-05

                                                                                          degrees of freedom of regression

                                                                                          [1] 3 696

                                                                                          Various estimates of between set correlations

                                                                                          Squared Canonical Correlations

                                                                                          [1] 0050 0033 0008

                                                                                          Chisq of canonical correlations

                                                                                          [1] 358 231 56

                                                                                          Average squared canonical correlation = 003

                                                                                          Cohens Set Correlation R2 = 009

                                                                                          Shrunken Set Correlation R2 = 008

                                                                                          F and df of Cohens Set Correlation 726 9 168186

                                                                                          Unweighted correlation between the two sets = 001

                                                                                          Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

                                                                                          6 Converting output to APA style tables using LATEX

                                                                                          Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenient

                                                                                          47

                                                                                          LATEXoutput and finally df2latex converts a generic data frame to LATEX

                                                                                          An example of converting the output from fa to LATEXappears in Table 2

                                                                                          Table 2 fa2latexA factor analysis table from the psych package in R

                                                                                          Variable MR1 MR2 MR3 h2 u2 com

                                                                                          Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

                                                                                          SS loadings 264 186 15

                                                                                          MR1 100 059 054MR2 059 100 052MR3 054 052 100

                                                                                          48

                                                                                          7 Miscellaneous functions

                                                                                          A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

                                                                                          blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

                                                                                          df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

                                                                                          scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

                                                                                          cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

                                                                                          cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

                                                                                          dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

                                                                                          fisherz Convert a correlation to the corresponding Fisher z score

                                                                                          geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

                                                                                          ICC and cohenkappa are typically used to find the reliability for raters

                                                                                          headtail combines the head and tail functions to show the first and last lines of a dataset or output

                                                                                          topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

                                                                                          mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

                                                                                          prep finds the probability of replication for an F t or r and estimate effect size

                                                                                          partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

                                                                                          rangeCorrection will correct correlations for restriction of range

                                                                                          reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

                                                                                          49

                                                                                          superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

                                                                                          8 Data sets

                                                                                          A number of data sets for demonstrating psychometric techniques are included in thepsych package These include six data sets showing a hierarchical factor structure (fivecognitive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2and one from health psychology Reise) One of these (Thurstone) is used as an examplein the sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) or 13 personality inventory scores (epibfi) and14 multiple choice iq items (iqitems) The vegetables example has paired comparisonpreferences for 9 vegetables This is an example of Thurstonian scaling used by Guilford(1954) and Nunnally (1967) Other data sets include cubits peas and heights fromGalton

                                                                                          Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

                                                                                          bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

                                                                                          satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

                                                                                          epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

                                                                                          50

                                                                                          iq 14 multiple choice ability items were included as part of the Synthetic Aperture Person-ality Assessment (SAPA) web based personality assessment project The data from1000 subjects are included here as a demonstration set for scoring multiple choiceinventories and doing basic item statistics

                                                                                          galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to ntroduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

                                                                                          Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

                                                                                          miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

                                                                                          9 Development version and a users guide

                                                                                          The most recent development version is available as a source file at the repository main-tained at httppersonality-projectorgr That version will have removed the mostrecently discovered bugs (but perhaps introduced other yet to be discovered ones) Todownload that version go to the repository httppersonality-projectorgrsrc

                                                                                          contrib and wander around For a Mac this version can be installed directly using theldquoother repositoryrdquo option in the package installer For a PC the zip file for the most recentrelease has been created using the win-builder facility at CRAN The development releasefor the Mac is usually several weeks ahead of the PC development version

                                                                                          Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httppersonality-projectorgrpsych_manualpdf

                                                                                          News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

                                                                                          gt news(Version gt 170package=psych)

                                                                                          51

                                                                                          10 Psychometric Theory

                                                                                          The psych package has been developed to help psychologists do basic research Many ofthe functions were developed to supplement a book (httppersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep)More information about the use of some of the functions may be found in the book

                                                                                          For more extensive discussion of the use of psych in particular and R in general consulthttppersonality-projectorgrrguidehtml A short guide to R

                                                                                          11 SessionInfo

                                                                                          This document was prepared using the following settings

                                                                                          gt sessionInfo()

                                                                                          R Under development (unstable) (2017-03-05 r72309)

                                                                                          Platform x86_64-apple-darwin1340 (64-bit)

                                                                                          Running under macOS Sierra 10124

                                                                                          Matrix products default

                                                                                          BLAS LibraryFrameworksRframeworkVersions34ResourcesliblibRblas0dylib

                                                                                          LAPACK LibraryFrameworksRframeworkVersions34ResourcesliblibRlapackdylib

                                                                                          locale

                                                                                          [1] C

                                                                                          attached base packages

                                                                                          [1] stats graphics grDevices utils datasets methods base

                                                                                          other attached packages

                                                                                          [1] psych_17421

                                                                                          loaded via a namespace (and not attached)

                                                                                          [1] compiler_340 parallel_340 tools_340 foreign_08-67

                                                                                          [5] KernSmooth_223-15 nlme_31-131 mnormt_15-4 grid_340

                                                                                          [9] lattice_020-34

                                                                                          52

                                                                                          References

                                                                                          Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

                                                                                          Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

                                                                                          Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

                                                                                          Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

                                                                                          Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

                                                                                          Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

                                                                                          Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

                                                                                          Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

                                                                                          Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

                                                                                          Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

                                                                                          Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

                                                                                          Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

                                                                                          Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

                                                                                          Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

                                                                                          Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

                                                                                          53

                                                                                          Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

                                                                                          Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

                                                                                          Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

                                                                                          Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

                                                                                          Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

                                                                                          Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

                                                                                          Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

                                                                                          Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

                                                                                          Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

                                                                                          Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

                                                                                          MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

                                                                                          Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

                                                                                          McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

                                                                                          Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

                                                                                          Nunnally J C (1967) Psychometric theory McGraw-Hill New York

                                                                                          54

                                                                                          Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

                                                                                          3rd edition

                                                                                          Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

                                                                                          Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

                                                                                          Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

                                                                                          Revelle W (2015) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 158

                                                                                          Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

                                                                                          Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

                                                                                          Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

                                                                                          Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

                                                                                          Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook ofIndividual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

                                                                                          Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

                                                                                          Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

                                                                                          Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

                                                                                          Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bang

                                                                                          55

                                                                                          for the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

                                                                                          Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

                                                                                          Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

                                                                                          Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

                                                                                          Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

                                                                                          Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

                                                                                          Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

                                                                                          Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

                                                                                          Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

                                                                                          Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

                                                                                          Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

                                                                                          Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

                                                                                          56

                                                                                          Index

                                                                                          affect 14 24alpha 5 6

                                                                                          Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26bifactor 6biserial 13 34blockrandom 49burt 34

                                                                                          char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49circular statistics 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49ctv 7cubits 50

                                                                                          densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49diagram 8drawcor 34drawtetra 34dummycode 13

                                                                                          dynamite plot 19

                                                                                          edit 3epibfi 50error bars 19errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23

                                                                                          fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factor analysis 6factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49

                                                                                          galton 51generalized least squares 6geometricmean 49GPArotation 7guttman 6

                                                                                          harmonicmean 49head 49headtail 49heights 50hetdiagram 7Hmisc 28Holzinger 50

                                                                                          57

                                                                                          ICC 6 49iclust 6iclustdiagram 7Index 49introduction to psychometric theory with ap-

                                                                                          plications in R 7iqitems 50irtfa 6 47irt2latex 47 49

                                                                                          KnitR 47

                                                                                          lavaan 38library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27lowess 14

                                                                                          makekeys 14MAP 6mardia 49maximum likelihood 6mediate 4 41 42mediatediagram 41minimum residual 6mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevel 37multilevelreliability 6multiple regression 38

                                                                                          nfactors 6nlme 37

                                                                                          omega 6 7outlier 3 11 12

                                                                                          padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7principal axis 6psych 3 5ndash8 28 47 49ndash52

                                                                                          R functionaffect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50

                                                                                          58

                                                                                          densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13edit 3epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49head 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47

                                                                                          irt2latex 47 49library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7psych 51psych package

                                                                                          affect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26

                                                                                          59

                                                                                          biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49corsmooth 34cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7

                                                                                          fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47irt2latex 47 49lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12prep 49pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34

                                                                                          60

                                                                                          polyserial 34principal 5ndash7psych 51rtest 28rangeCorrection 49readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

                                                                                          rtest 28

                                                                                          rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50table 19tail 49tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

                                                                                          R package

                                                                                          61

                                                                                          ctv 7GPArotation 7Hmisc 28KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash8 28 47 49ndash52Rgraphviz 8sem 7 50stats 28Sweave 47xtable 47

                                                                                          rtest 28rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49Rgraphviz 8

                                                                                          SAPA 26 50 51satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43sem 7 50set correlation 44setcor 49setCor 4 38 41 44 46 47simmultilevel 37

                                                                                          spider 13stars 14stats 28StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50Sweave 47

                                                                                          table 19tail 49tetrachoric 6 34Thurstone 28 38 50Thurstone33 50topBottom 49

                                                                                          vegetables 50 51violinBy 14 18vss 5 6

                                                                                          weighted least squares 6withinBetween 37

                                                                                          xtable 47

                                                                                          62

                                                                                          • Jump starting the psych packagendasha guide for the impatient
                                                                                          • Psychometric functions are summarized in the second vignette
                                                                                          • Overview of this and related documents
                                                                                          • Getting started
                                                                                          • Basic data analysis
                                                                                            • Getting the data by using readfile
                                                                                            • Data input from the clipboard
                                                                                            • Basic descriptive statistics
                                                                                              • Outlier detection using outlier
                                                                                              • Basic data cleaning using scrub
                                                                                              • Recoding categorical variables into dummy coded variables
                                                                                                • Simple descriptive graphics
                                                                                                  • Scatter Plot Matrices
                                                                                                  • Density or violin plots
                                                                                                  • Means and error bars
                                                                                                  • Error bars for tabular data
                                                                                                  • Two dimensional displays of means and errors
                                                                                                  • Back to back histograms
                                                                                                  • Correlational structure
                                                                                                  • Heatmap displays of correlational structure
                                                                                                    • Testing correlations
                                                                                                    • Polychoric tetrachoric polyserial and biserial correlations
                                                                                                      • Multilevel modeling
                                                                                                        • Decomposing data into within and between level correlations using statsBy
                                                                                                        • Generating and displaying multilevel data
                                                                                                        • Factor analysis by groups
                                                                                                          • Multiple Regression mediation moderation and set correlations
                                                                                                            • Multiple regression from data or correlation matrices
                                                                                                            • Mediation and Moderation analysis
                                                                                                            • Set Correlation
                                                                                                              • Converting output to APA style tables using LaTeX
                                                                                                              • Miscellaneous functions
                                                                                                              • Data sets
                                                                                                              • Development version and a users guide
                                                                                                              • Psychometric Theory
                                                                                                              • SessionInfo

                                                                                            Min 1Q Median 3Q Max

                                                                                            -252458 -32133 07769 35921 92630

                                                                                            Coefficients

                                                                                            Estimate Std Error t value Pr(gt|t|)

                                                                                            (Intercept) 2741706 082140 33378 lt 2e-16

                                                                                            gender -048606 037984 -1280 020110

                                                                                            education 047890 015235 3143 000174

                                                                                            age 001623 002278 0712 047650

                                                                                            ---

                                                                                            Signif codes 0 0001 001 005 01 1

                                                                                            Residual standard error 4768 on 696 degrees of freedom

                                                                                            Multiple R-squared 00272 Adjusted R-squared 002301

                                                                                            F-statistic 6487 on 3 and 696 DF p-value 00002476

                                                                                            Compare this with the output from setCor

                                                                                            gt compare with sector

                                                                                            gt setCor(c(46)c(13)C nobs=700)

                                                                                            Call setCor(y = c(46) x = c(13) data = C nobs = 700)

                                                                                            Multiple Regression from matrix input

                                                                                            Beta weights

                                                                                            ACT SATV SATQ

                                                                                            gender -005 -003 -018

                                                                                            education 014 010 010

                                                                                            age 003 -010 -009

                                                                                            Multiple R

                                                                                            ACT SATV SATQ

                                                                                            016 010 019

                                                                                            multiple R2

                                                                                            ACT SATV SATQ

                                                                                            00272 00096 00359

                                                                                            Multiple Inflation Factor (VIF) = 1(1-SMC) =

                                                                                            gender education age

                                                                                            101 145 144

                                                                                            Unweighted multiple R

                                                                                            ACT SATV SATQ

                                                                                            015 005 011

                                                                                            Unweighted multiple R2

                                                                                            ACT SATV SATQ

                                                                                            002 000 001

                                                                                            SE of Beta weights

                                                                                            ACT SATV SATQ

                                                                                            gender 018 429 434

                                                                                            education 022 513 518

                                                                                            age 022 511 516

                                                                                            t of Beta Weights

                                                                                            ACT SATV SATQ

                                                                                            gender -027 -001 -004

                                                                                            education 065 002 002

                                                                                            46

                                                                                            age 015 -002 -002

                                                                                            Probability of t lt

                                                                                            ACT SATV SATQ

                                                                                            gender 079 099 097

                                                                                            education 051 098 098

                                                                                            age 088 098 099

                                                                                            Shrunken R2

                                                                                            ACT SATV SATQ

                                                                                            00230 00054 00317

                                                                                            Standard Error of R2

                                                                                            ACT SATV SATQ

                                                                                            00120 00073 00137

                                                                                            F

                                                                                            ACT SATV SATQ

                                                                                            649 226 863

                                                                                            Probability of F lt

                                                                                            ACT SATV SATQ

                                                                                            248e-04 808e-02 124e-05

                                                                                            degrees of freedom of regression

                                                                                            [1] 3 696

                                                                                            Various estimates of between set correlations

                                                                                            Squared Canonical Correlations

                                                                                            [1] 0050 0033 0008

                                                                                            Chisq of canonical correlations

                                                                                            [1] 358 231 56

                                                                                            Average squared canonical correlation = 003

                                                                                            Cohens Set Correlation R2 = 009

                                                                                            Shrunken Set Correlation R2 = 008

                                                                                            F and df of Cohens Set Correlation 726 9 168186

                                                                                            Unweighted correlation between the two sets = 001

                                                                                            Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

                                                                                            6 Converting output to APA style tables using LATEX

                                                                                            Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenient

                                                                                            47

                                                                                            LATEXoutput and finally df2latex converts a generic data frame to LATEX

                                                                                            An example of converting the output from fa to LATEXappears in Table 2

                                                                                            Table 2 fa2latexA factor analysis table from the psych package in R

                                                                                            Variable MR1 MR2 MR3 h2 u2 com

                                                                                            Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

                                                                                            SS loadings 264 186 15

                                                                                            MR1 100 059 054MR2 059 100 052MR3 054 052 100

                                                                                            48

                                                                                            7 Miscellaneous functions

                                                                                            A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

                                                                                            blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

                                                                                            df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

                                                                                            scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

                                                                                            cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

                                                                                            cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

                                                                                            dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

                                                                                            fisherz Convert a correlation to the corresponding Fisher z score

                                                                                            geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

                                                                                            ICC and cohenkappa are typically used to find the reliability for raters

                                                                                            headtail combines the head and tail functions to show the first and last lines of a dataset or output

                                                                                            topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

                                                                                            mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

                                                                                            prep finds the probability of replication for an F t or r and estimate effect size

                                                                                            partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

                                                                                            rangeCorrection will correct correlations for restriction of range

                                                                                            reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

                                                                                            49

                                                                                            superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

                                                                                            8 Data sets

                                                                                            A number of data sets for demonstrating psychometric techniques are included in thepsych package These include six data sets showing a hierarchical factor structure (fivecognitive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2and one from health psychology Reise) One of these (Thurstone) is used as an examplein the sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) or 13 personality inventory scores (epibfi) and14 multiple choice iq items (iqitems) The vegetables example has paired comparisonpreferences for 9 vegetables This is an example of Thurstonian scaling used by Guilford(1954) and Nunnally (1967) Other data sets include cubits peas and heights fromGalton

                                                                                            Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

                                                                                            bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

                                                                                            satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

                                                                                            epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

                                                                                            50

                                                                                            iq 14 multiple choice ability items were included as part of the Synthetic Aperture Person-ality Assessment (SAPA) web based personality assessment project The data from1000 subjects are included here as a demonstration set for scoring multiple choiceinventories and doing basic item statistics

                                                                                            galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to ntroduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

                                                                                            Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

                                                                                            miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

                                                                                            9 Development version and a users guide

                                                                                            The most recent development version is available as a source file at the repository main-tained at httppersonality-projectorgr That version will have removed the mostrecently discovered bugs (but perhaps introduced other yet to be discovered ones) Todownload that version go to the repository httppersonality-projectorgrsrc

                                                                                            contrib and wander around For a Mac this version can be installed directly using theldquoother repositoryrdquo option in the package installer For a PC the zip file for the most recentrelease has been created using the win-builder facility at CRAN The development releasefor the Mac is usually several weeks ahead of the PC development version

                                                                                            Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httppersonality-projectorgrpsych_manualpdf

                                                                                            News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

                                                                                            gt news(Version gt 170package=psych)

                                                                                            51

                                                                                            10 Psychometric Theory

                                                                                            The psych package has been developed to help psychologists do basic research Many ofthe functions were developed to supplement a book (httppersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep)More information about the use of some of the functions may be found in the book

                                                                                            For more extensive discussion of the use of psych in particular and R in general consulthttppersonality-projectorgrrguidehtml A short guide to R

                                                                                            11 SessionInfo

                                                                                            This document was prepared using the following settings

                                                                                            gt sessionInfo()

                                                                                            R Under development (unstable) (2017-03-05 r72309)

                                                                                            Platform x86_64-apple-darwin1340 (64-bit)

                                                                                            Running under macOS Sierra 10124

                                                                                            Matrix products default

                                                                                            BLAS LibraryFrameworksRframeworkVersions34ResourcesliblibRblas0dylib

                                                                                            LAPACK LibraryFrameworksRframeworkVersions34ResourcesliblibRlapackdylib

                                                                                            locale

                                                                                            [1] C

                                                                                            attached base packages

                                                                                            [1] stats graphics grDevices utils datasets methods base

                                                                                            other attached packages

                                                                                            [1] psych_17421

                                                                                            loaded via a namespace (and not attached)

                                                                                            [1] compiler_340 parallel_340 tools_340 foreign_08-67

                                                                                            [5] KernSmooth_223-15 nlme_31-131 mnormt_15-4 grid_340

                                                                                            [9] lattice_020-34

                                                                                            52

                                                                                            References

                                                                                            Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

                                                                                            Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

                                                                                            Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

                                                                                            Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

                                                                                            Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

                                                                                            Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

                                                                                            Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

                                                                                            Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

                                                                                            Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

                                                                                            Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

                                                                                            Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

                                                                                            Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

                                                                                            Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

                                                                                            Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

                                                                                            Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

                                                                                            53

                                                                                            Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

                                                                                            Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

                                                                                            Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

                                                                                            Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

                                                                                            Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

                                                                                            Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

                                                                                            Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

                                                                                            Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

                                                                                            Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

                                                                                            Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

                                                                                            MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

                                                                                            Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

                                                                                            McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

                                                                                            Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

                                                                                            Nunnally J C (1967) Psychometric theory McGraw-Hill New York

                                                                                            54

                                                                                            Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

                                                                                            3rd edition

                                                                                            Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

                                                                                            Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

                                                                                            Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

                                                                                            Revelle W (2015) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 158

                                                                                            Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

                                                                                            Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

                                                                                            Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

                                                                                            Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

                                                                                            Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook ofIndividual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

                                                                                            Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

                                                                                            Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

                                                                                            Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

                                                                                            Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bang

                                                                                            55

                                                                                            for the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

                                                                                            Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

                                                                                            Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

                                                                                            Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

                                                                                            Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

                                                                                            Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

                                                                                            Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

                                                                                            Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

                                                                                            Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

                                                                                            Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

                                                                                            Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

                                                                                            Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

                                                                                            56

                                                                                            Index

                                                                                            affect 14 24alpha 5 6

                                                                                            Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26bifactor 6biserial 13 34blockrandom 49burt 34

                                                                                            char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49circular statistics 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49ctv 7cubits 50

                                                                                            densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49diagram 8drawcor 34drawtetra 34dummycode 13

                                                                                            dynamite plot 19

                                                                                            edit 3epibfi 50error bars 19errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23

                                                                                            fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factor analysis 6factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49

                                                                                            galton 51generalized least squares 6geometricmean 49GPArotation 7guttman 6

                                                                                            harmonicmean 49head 49headtail 49heights 50hetdiagram 7Hmisc 28Holzinger 50

                                                                                            57

                                                                                            ICC 6 49iclust 6iclustdiagram 7Index 49introduction to psychometric theory with ap-

                                                                                            plications in R 7iqitems 50irtfa 6 47irt2latex 47 49

                                                                                            KnitR 47

                                                                                            lavaan 38library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27lowess 14

                                                                                            makekeys 14MAP 6mardia 49maximum likelihood 6mediate 4 41 42mediatediagram 41minimum residual 6mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevel 37multilevelreliability 6multiple regression 38

                                                                                            nfactors 6nlme 37

                                                                                            omega 6 7outlier 3 11 12

                                                                                            padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7principal axis 6psych 3 5ndash8 28 47 49ndash52

                                                                                            R functionaffect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50

                                                                                            58

                                                                                            densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13edit 3epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49head 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47

                                                                                            irt2latex 47 49library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7psych 51psych package

                                                                                            affect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26

                                                                                            59

                                                                                            biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49corsmooth 34cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7

                                                                                            fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47irt2latex 47 49lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12prep 49pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34

                                                                                            60

                                                                                            polyserial 34principal 5ndash7psych 51rtest 28rangeCorrection 49readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

                                                                                            rtest 28

                                                                                            rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50table 19tail 49tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

                                                                                            R package

                                                                                            61

                                                                                            ctv 7GPArotation 7Hmisc 28KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash8 28 47 49ndash52Rgraphviz 8sem 7 50stats 28Sweave 47xtable 47

                                                                                            rtest 28rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49Rgraphviz 8

                                                                                            SAPA 26 50 51satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43sem 7 50set correlation 44setcor 49setCor 4 38 41 44 46 47simmultilevel 37

                                                                                            spider 13stars 14stats 28StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50Sweave 47

                                                                                            table 19tail 49tetrachoric 6 34Thurstone 28 38 50Thurstone33 50topBottom 49

                                                                                            vegetables 50 51violinBy 14 18vss 5 6

                                                                                            weighted least squares 6withinBetween 37

                                                                                            xtable 47

                                                                                            62

                                                                                            • Jump starting the psych packagendasha guide for the impatient
                                                                                            • Psychometric functions are summarized in the second vignette
                                                                                            • Overview of this and related documents
                                                                                            • Getting started
                                                                                            • Basic data analysis
                                                                                              • Getting the data by using readfile
                                                                                              • Data input from the clipboard
                                                                                              • Basic descriptive statistics
                                                                                                • Outlier detection using outlier
                                                                                                • Basic data cleaning using scrub
                                                                                                • Recoding categorical variables into dummy coded variables
                                                                                                  • Simple descriptive graphics
                                                                                                    • Scatter Plot Matrices
                                                                                                    • Density or violin plots
                                                                                                    • Means and error bars
                                                                                                    • Error bars for tabular data
                                                                                                    • Two dimensional displays of means and errors
                                                                                                    • Back to back histograms
                                                                                                    • Correlational structure
                                                                                                    • Heatmap displays of correlational structure
                                                                                                      • Testing correlations
                                                                                                      • Polychoric tetrachoric polyserial and biserial correlations
                                                                                                        • Multilevel modeling
                                                                                                          • Decomposing data into within and between level correlations using statsBy
                                                                                                          • Generating and displaying multilevel data
                                                                                                          • Factor analysis by groups
                                                                                                            • Multiple Regression mediation moderation and set correlations
                                                                                                              • Multiple regression from data or correlation matrices
                                                                                                              • Mediation and Moderation analysis
                                                                                                              • Set Correlation
                                                                                                                • Converting output to APA style tables using LaTeX
                                                                                                                • Miscellaneous functions
                                                                                                                • Data sets
                                                                                                                • Development version and a users guide
                                                                                                                • Psychometric Theory
                                                                                                                • SessionInfo

                                                                                              age 015 -002 -002

                                                                                              Probability of t lt

                                                                                              ACT SATV SATQ

                                                                                              gender 079 099 097

                                                                                              education 051 098 098

                                                                                              age 088 098 099

                                                                                              Shrunken R2

                                                                                              ACT SATV SATQ

                                                                                              00230 00054 00317

                                                                                              Standard Error of R2

                                                                                              ACT SATV SATQ

                                                                                              00120 00073 00137

                                                                                              F

                                                                                              ACT SATV SATQ

                                                                                              649 226 863

                                                                                              Probability of F lt

                                                                                              ACT SATV SATQ

                                                                                              248e-04 808e-02 124e-05

                                                                                              degrees of freedom of regression

                                                                                              [1] 3 696

                                                                                              Various estimates of between set correlations

                                                                                              Squared Canonical Correlations

                                                                                              [1] 0050 0033 0008

                                                                                              Chisq of canonical correlations

                                                                                              [1] 358 231 56

                                                                                              Average squared canonical correlation = 003

                                                                                              Cohens Set Correlation R2 = 009

                                                                                              Shrunken Set Correlation R2 = 008

                                                                                              F and df of Cohens Set Correlation 726 9 168186

                                                                                              Unweighted correlation between the two sets = 001

                                                                                              Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

                                                                                              6 Converting output to APA style tables using LATEX

                                                                                              Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenient

                                                                                              47

                                                                                              LATEXoutput and finally df2latex converts a generic data frame to LATEX

                                                                                              An example of converting the output from fa to LATEXappears in Table 2

                                                                                              Table 2 fa2latexA factor analysis table from the psych package in R

                                                                                              Variable MR1 MR2 MR3 h2 u2 com

                                                                                              Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

                                                                                              SS loadings 264 186 15

                                                                                              MR1 100 059 054MR2 059 100 052MR3 054 052 100

                                                                                              48

                                                                                              7 Miscellaneous functions

                                                                                              A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

                                                                                              blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

                                                                                              df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

                                                                                              scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

                                                                                              cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

                                                                                              cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

                                                                                              dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

                                                                                              fisherz Convert a correlation to the corresponding Fisher z score

                                                                                              geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

                                                                                              ICC and cohenkappa are typically used to find the reliability for raters

                                                                                              headtail combines the head and tail functions to show the first and last lines of a dataset or output

                                                                                              topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

                                                                                              mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

                                                                                              prep finds the probability of replication for an F t or r and estimate effect size

                                                                                              partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

                                                                                              rangeCorrection will correct correlations for restriction of range

                                                                                              reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

                                                                                              49

                                                                                              superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

                                                                                              8 Data sets

                                                                                              A number of data sets for demonstrating psychometric techniques are included in thepsych package These include six data sets showing a hierarchical factor structure (fivecognitive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2and one from health psychology Reise) One of these (Thurstone) is used as an examplein the sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) or 13 personality inventory scores (epibfi) and14 multiple choice iq items (iqitems) The vegetables example has paired comparisonpreferences for 9 vegetables This is an example of Thurstonian scaling used by Guilford(1954) and Nunnally (1967) Other data sets include cubits peas and heights fromGalton

                                                                                              Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

                                                                                              bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

                                                                                              satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

                                                                                              epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

                                                                                              50

                                                                                              iq 14 multiple choice ability items were included as part of the Synthetic Aperture Person-ality Assessment (SAPA) web based personality assessment project The data from1000 subjects are included here as a demonstration set for scoring multiple choiceinventories and doing basic item statistics

                                                                                              galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to ntroduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

                                                                                              Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

                                                                                              miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

                                                                                              9 Development version and a users guide

                                                                                              The most recent development version is available as a source file at the repository main-tained at httppersonality-projectorgr That version will have removed the mostrecently discovered bugs (but perhaps introduced other yet to be discovered ones) Todownload that version go to the repository httppersonality-projectorgrsrc

                                                                                              contrib and wander around For a Mac this version can be installed directly using theldquoother repositoryrdquo option in the package installer For a PC the zip file for the most recentrelease has been created using the win-builder facility at CRAN The development releasefor the Mac is usually several weeks ahead of the PC development version

                                                                                              Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httppersonality-projectorgrpsych_manualpdf

                                                                                              News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

                                                                                              gt news(Version gt 170package=psych)

                                                                                              51

                                                                                              10 Psychometric Theory

                                                                                              The psych package has been developed to help psychologists do basic research Many ofthe functions were developed to supplement a book (httppersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep)More information about the use of some of the functions may be found in the book

                                                                                              For more extensive discussion of the use of psych in particular and R in general consulthttppersonality-projectorgrrguidehtml A short guide to R

                                                                                              11 SessionInfo

                                                                                              This document was prepared using the following settings

                                                                                              gt sessionInfo()

                                                                                              R Under development (unstable) (2017-03-05 r72309)

                                                                                              Platform x86_64-apple-darwin1340 (64-bit)

                                                                                              Running under macOS Sierra 10124

                                                                                              Matrix products default

                                                                                              BLAS LibraryFrameworksRframeworkVersions34ResourcesliblibRblas0dylib

                                                                                              LAPACK LibraryFrameworksRframeworkVersions34ResourcesliblibRlapackdylib

                                                                                              locale

                                                                                              [1] C

                                                                                              attached base packages

                                                                                              [1] stats graphics grDevices utils datasets methods base

                                                                                              other attached packages

                                                                                              [1] psych_17421

                                                                                              loaded via a namespace (and not attached)

                                                                                              [1] compiler_340 parallel_340 tools_340 foreign_08-67

                                                                                              [5] KernSmooth_223-15 nlme_31-131 mnormt_15-4 grid_340

                                                                                              [9] lattice_020-34

                                                                                              52

                                                                                              References

                                                                                              Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

                                                                                              Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

                                                                                              Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

                                                                                              Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

                                                                                              Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

                                                                                              Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

                                                                                              Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

                                                                                              Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

                                                                                              Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

                                                                                              Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

                                                                                              Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

                                                                                              Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

                                                                                              Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

                                                                                              Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

                                                                                              Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

                                                                                              53

                                                                                              Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

                                                                                              Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

                                                                                              Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

                                                                                              Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

                                                                                              Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

                                                                                              Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

                                                                                              Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

                                                                                              Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

                                                                                              Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

                                                                                              Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

                                                                                              MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

                                                                                              Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

                                                                                              McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

                                                                                              Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

                                                                                              Nunnally J C (1967) Psychometric theory McGraw-Hill New York

                                                                                              54

                                                                                              Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

                                                                                              3rd edition

                                                                                              Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

                                                                                              Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

                                                                                              Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

                                                                                              Revelle W (2015) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 158

                                                                                              Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

                                                                                              Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

                                                                                              Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

                                                                                              Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

                                                                                              Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook ofIndividual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

                                                                                              Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

                                                                                              Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

                                                                                              Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

                                                                                              Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bang

                                                                                              55

                                                                                              for the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

                                                                                              Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

                                                                                              Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

                                                                                              Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

                                                                                              Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

                                                                                              Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

                                                                                              Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

                                                                                              Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

                                                                                              Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

                                                                                              Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

                                                                                              Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

                                                                                              Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

                                                                                              56

                                                                                              Index

                                                                                              affect 14 24alpha 5 6

                                                                                              Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26bifactor 6biserial 13 34blockrandom 49burt 34

                                                                                              char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49circular statistics 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49ctv 7cubits 50

                                                                                              densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49diagram 8drawcor 34drawtetra 34dummycode 13

                                                                                              dynamite plot 19

                                                                                              edit 3epibfi 50error bars 19errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23

                                                                                              fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factor analysis 6factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49

                                                                                              galton 51generalized least squares 6geometricmean 49GPArotation 7guttman 6

                                                                                              harmonicmean 49head 49headtail 49heights 50hetdiagram 7Hmisc 28Holzinger 50

                                                                                              57

                                                                                              ICC 6 49iclust 6iclustdiagram 7Index 49introduction to psychometric theory with ap-

                                                                                              plications in R 7iqitems 50irtfa 6 47irt2latex 47 49

                                                                                              KnitR 47

                                                                                              lavaan 38library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27lowess 14

                                                                                              makekeys 14MAP 6mardia 49maximum likelihood 6mediate 4 41 42mediatediagram 41minimum residual 6mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevel 37multilevelreliability 6multiple regression 38

                                                                                              nfactors 6nlme 37

                                                                                              omega 6 7outlier 3 11 12

                                                                                              padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7principal axis 6psych 3 5ndash8 28 47 49ndash52

                                                                                              R functionaffect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50

                                                                                              58

                                                                                              densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13edit 3epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49head 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47

                                                                                              irt2latex 47 49library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7psych 51psych package

                                                                                              affect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26

                                                                                              59

                                                                                              biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49corsmooth 34cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7

                                                                                              fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47irt2latex 47 49lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12prep 49pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34

                                                                                              60

                                                                                              polyserial 34principal 5ndash7psych 51rtest 28rangeCorrection 49readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

                                                                                              rtest 28

                                                                                              rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50table 19tail 49tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

                                                                                              R package

                                                                                              61

                                                                                              ctv 7GPArotation 7Hmisc 28KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash8 28 47 49ndash52Rgraphviz 8sem 7 50stats 28Sweave 47xtable 47

                                                                                              rtest 28rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49Rgraphviz 8

                                                                                              SAPA 26 50 51satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43sem 7 50set correlation 44setcor 49setCor 4 38 41 44 46 47simmultilevel 37

                                                                                              spider 13stars 14stats 28StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50Sweave 47

                                                                                              table 19tail 49tetrachoric 6 34Thurstone 28 38 50Thurstone33 50topBottom 49

                                                                                              vegetables 50 51violinBy 14 18vss 5 6

                                                                                              weighted least squares 6withinBetween 37

                                                                                              xtable 47

                                                                                              62

                                                                                              • Jump starting the psych packagendasha guide for the impatient
                                                                                              • Psychometric functions are summarized in the second vignette
                                                                                              • Overview of this and related documents
                                                                                              • Getting started
                                                                                              • Basic data analysis
                                                                                                • Getting the data by using readfile
                                                                                                • Data input from the clipboard
                                                                                                • Basic descriptive statistics
                                                                                                  • Outlier detection using outlier
                                                                                                  • Basic data cleaning using scrub
                                                                                                  • Recoding categorical variables into dummy coded variables
                                                                                                    • Simple descriptive graphics
                                                                                                      • Scatter Plot Matrices
                                                                                                      • Density or violin plots
                                                                                                      • Means and error bars
                                                                                                      • Error bars for tabular data
                                                                                                      • Two dimensional displays of means and errors
                                                                                                      • Back to back histograms
                                                                                                      • Correlational structure
                                                                                                      • Heatmap displays of correlational structure
                                                                                                        • Testing correlations
                                                                                                        • Polychoric tetrachoric polyserial and biserial correlations
                                                                                                          • Multilevel modeling
                                                                                                            • Decomposing data into within and between level correlations using statsBy
                                                                                                            • Generating and displaying multilevel data
                                                                                                            • Factor analysis by groups
                                                                                                              • Multiple Regression mediation moderation and set correlations
                                                                                                                • Multiple regression from data or correlation matrices
                                                                                                                • Mediation and Moderation analysis
                                                                                                                • Set Correlation
                                                                                                                  • Converting output to APA style tables using LaTeX
                                                                                                                  • Miscellaneous functions
                                                                                                                  • Data sets
                                                                                                                  • Development version and a users guide
                                                                                                                  • Psychometric Theory
                                                                                                                  • SessionInfo

                                                                                                LATEXoutput and finally df2latex converts a generic data frame to LATEX

                                                                                                An example of converting the output from fa to LATEXappears in Table 2

                                                                                                Table 2 fa2latexA factor analysis table from the psych package in R

                                                                                                Variable MR1 MR2 MR3 h2 u2 com

                                                                                                Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

                                                                                                SS loadings 264 186 15

                                                                                                MR1 100 059 054MR2 059 100 052MR3 054 052 100

                                                                                                48

                                                                                                7 Miscellaneous functions

                                                                                                A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

                                                                                                blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

                                                                                                df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

                                                                                                scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

                                                                                                cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

                                                                                                cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

                                                                                                dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

                                                                                                fisherz Convert a correlation to the corresponding Fisher z score

                                                                                                geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

                                                                                                ICC and cohenkappa are typically used to find the reliability for raters

                                                                                                headtail combines the head and tail functions to show the first and last lines of a dataset or output

                                                                                                topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

                                                                                                mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

                                                                                                prep finds the probability of replication for an F t or r and estimate effect size

                                                                                                partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

                                                                                                rangeCorrection will correct correlations for restriction of range

                                                                                                reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

                                                                                                49

                                                                                                superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

                                                                                                8 Data sets

                                                                                                A number of data sets for demonstrating psychometric techniques are included in thepsych package These include six data sets showing a hierarchical factor structure (fivecognitive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2and one from health psychology Reise) One of these (Thurstone) is used as an examplein the sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) or 13 personality inventory scores (epibfi) and14 multiple choice iq items (iqitems) The vegetables example has paired comparisonpreferences for 9 vegetables This is an example of Thurstonian scaling used by Guilford(1954) and Nunnally (1967) Other data sets include cubits peas and heights fromGalton

                                                                                                Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

                                                                                                bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

                                                                                                satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

                                                                                                epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

                                                                                                50

                                                                                                iq 14 multiple choice ability items were included as part of the Synthetic Aperture Person-ality Assessment (SAPA) web based personality assessment project The data from1000 subjects are included here as a demonstration set for scoring multiple choiceinventories and doing basic item statistics

                                                                                                galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to ntroduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

                                                                                                Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

                                                                                                miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

                                                                                                9 Development version and a users guide

                                                                                                The most recent development version is available as a source file at the repository main-tained at httppersonality-projectorgr That version will have removed the mostrecently discovered bugs (but perhaps introduced other yet to be discovered ones) Todownload that version go to the repository httppersonality-projectorgrsrc

                                                                                                contrib and wander around For a Mac this version can be installed directly using theldquoother repositoryrdquo option in the package installer For a PC the zip file for the most recentrelease has been created using the win-builder facility at CRAN The development releasefor the Mac is usually several weeks ahead of the PC development version

                                                                                                Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httppersonality-projectorgrpsych_manualpdf

                                                                                                News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

                                                                                                gt news(Version gt 170package=psych)

                                                                                                51

                                                                                                10 Psychometric Theory

                                                                                                The psych package has been developed to help psychologists do basic research Many ofthe functions were developed to supplement a book (httppersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep)More information about the use of some of the functions may be found in the book

                                                                                                For more extensive discussion of the use of psych in particular and R in general consulthttppersonality-projectorgrrguidehtml A short guide to R

                                                                                                11 SessionInfo

                                                                                                This document was prepared using the following settings

                                                                                                gt sessionInfo()

                                                                                                R Under development (unstable) (2017-03-05 r72309)

                                                                                                Platform x86_64-apple-darwin1340 (64-bit)

                                                                                                Running under macOS Sierra 10124

                                                                                                Matrix products default

                                                                                                BLAS LibraryFrameworksRframeworkVersions34ResourcesliblibRblas0dylib

                                                                                                LAPACK LibraryFrameworksRframeworkVersions34ResourcesliblibRlapackdylib

                                                                                                locale

                                                                                                [1] C

                                                                                                attached base packages

                                                                                                [1] stats graphics grDevices utils datasets methods base

                                                                                                other attached packages

                                                                                                [1] psych_17421

                                                                                                loaded via a namespace (and not attached)

                                                                                                [1] compiler_340 parallel_340 tools_340 foreign_08-67

                                                                                                [5] KernSmooth_223-15 nlme_31-131 mnormt_15-4 grid_340

                                                                                                [9] lattice_020-34

                                                                                                52

                                                                                                References

                                                                                                Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

                                                                                                Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

                                                                                                Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

                                                                                                Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

                                                                                                Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

                                                                                                Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

                                                                                                Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

                                                                                                Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

                                                                                                Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

                                                                                                Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

                                                                                                Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

                                                                                                Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

                                                                                                Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

                                                                                                Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

                                                                                                Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

                                                                                                53

                                                                                                Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

                                                                                                Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

                                                                                                Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

                                                                                                Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

                                                                                                Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

                                                                                                Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

                                                                                                Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

                                                                                                Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

                                                                                                Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

                                                                                                Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

                                                                                                MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

                                                                                                Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

                                                                                                McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

                                                                                                Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

                                                                                                Nunnally J C (1967) Psychometric theory McGraw-Hill New York

                                                                                                54

                                                                                                Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

                                                                                                3rd edition

                                                                                                Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

                                                                                                Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

                                                                                                Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

                                                                                                Revelle W (2015) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 158

                                                                                                Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

                                                                                                Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

                                                                                                Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

                                                                                                Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

                                                                                                Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook ofIndividual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

                                                                                                Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

                                                                                                Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

                                                                                                Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

                                                                                                Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bang

                                                                                                55

                                                                                                for the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

                                                                                                Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

                                                                                                Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

                                                                                                Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

                                                                                                Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

                                                                                                Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

                                                                                                Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

                                                                                                Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

                                                                                                Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

                                                                                                Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

                                                                                                Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

                                                                                                Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

                                                                                                56

                                                                                                Index

                                                                                                affect 14 24alpha 5 6

                                                                                                Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26bifactor 6biserial 13 34blockrandom 49burt 34

                                                                                                char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49circular statistics 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49ctv 7cubits 50

                                                                                                densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49diagram 8drawcor 34drawtetra 34dummycode 13

                                                                                                dynamite plot 19

                                                                                                edit 3epibfi 50error bars 19errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23

                                                                                                fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factor analysis 6factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49

                                                                                                galton 51generalized least squares 6geometricmean 49GPArotation 7guttman 6

                                                                                                harmonicmean 49head 49headtail 49heights 50hetdiagram 7Hmisc 28Holzinger 50

                                                                                                57

                                                                                                ICC 6 49iclust 6iclustdiagram 7Index 49introduction to psychometric theory with ap-

                                                                                                plications in R 7iqitems 50irtfa 6 47irt2latex 47 49

                                                                                                KnitR 47

                                                                                                lavaan 38library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27lowess 14

                                                                                                makekeys 14MAP 6mardia 49maximum likelihood 6mediate 4 41 42mediatediagram 41minimum residual 6mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevel 37multilevelreliability 6multiple regression 38

                                                                                                nfactors 6nlme 37

                                                                                                omega 6 7outlier 3 11 12

                                                                                                padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7principal axis 6psych 3 5ndash8 28 47 49ndash52

                                                                                                R functionaffect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50

                                                                                                58

                                                                                                densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13edit 3epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49head 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47

                                                                                                irt2latex 47 49library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7psych 51psych package

                                                                                                affect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26

                                                                                                59

                                                                                                biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49corsmooth 34cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7

                                                                                                fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47irt2latex 47 49lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12prep 49pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34

                                                                                                60

                                                                                                polyserial 34principal 5ndash7psych 51rtest 28rangeCorrection 49readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

                                                                                                rtest 28

                                                                                                rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50table 19tail 49tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

                                                                                                R package

                                                                                                61

                                                                                                ctv 7GPArotation 7Hmisc 28KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash8 28 47 49ndash52Rgraphviz 8sem 7 50stats 28Sweave 47xtable 47

                                                                                                rtest 28rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49Rgraphviz 8

                                                                                                SAPA 26 50 51satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43sem 7 50set correlation 44setcor 49setCor 4 38 41 44 46 47simmultilevel 37

                                                                                                spider 13stars 14stats 28StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50Sweave 47

                                                                                                table 19tail 49tetrachoric 6 34Thurstone 28 38 50Thurstone33 50topBottom 49

                                                                                                vegetables 50 51violinBy 14 18vss 5 6

                                                                                                weighted least squares 6withinBetween 37

                                                                                                xtable 47

                                                                                                62

                                                                                                • Jump starting the psych packagendasha guide for the impatient
                                                                                                • Psychometric functions are summarized in the second vignette
                                                                                                • Overview of this and related documents
                                                                                                • Getting started
                                                                                                • Basic data analysis
                                                                                                  • Getting the data by using readfile
                                                                                                  • Data input from the clipboard
                                                                                                  • Basic descriptive statistics
                                                                                                    • Outlier detection using outlier
                                                                                                    • Basic data cleaning using scrub
                                                                                                    • Recoding categorical variables into dummy coded variables
                                                                                                      • Simple descriptive graphics
                                                                                                        • Scatter Plot Matrices
                                                                                                        • Density or violin plots
                                                                                                        • Means and error bars
                                                                                                        • Error bars for tabular data
                                                                                                        • Two dimensional displays of means and errors
                                                                                                        • Back to back histograms
                                                                                                        • Correlational structure
                                                                                                        • Heatmap displays of correlational structure
                                                                                                          • Testing correlations
                                                                                                          • Polychoric tetrachoric polyserial and biserial correlations
                                                                                                            • Multilevel modeling
                                                                                                              • Decomposing data into within and between level correlations using statsBy
                                                                                                              • Generating and displaying multilevel data
                                                                                                              • Factor analysis by groups
                                                                                                                • Multiple Regression mediation moderation and set correlations
                                                                                                                  • Multiple regression from data or correlation matrices
                                                                                                                  • Mediation and Moderation analysis
                                                                                                                  • Set Correlation
                                                                                                                    • Converting output to APA style tables using LaTeX
                                                                                                                    • Miscellaneous functions
                                                                                                                    • Data sets
                                                                                                                    • Development version and a users guide
                                                                                                                    • Psychometric Theory
                                                                                                                    • SessionInfo

                                                                                                  7 Miscellaneous functions

                                                                                                  A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

                                                                                                  blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

                                                                                                  df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

                                                                                                  scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

                                                                                                  cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

                                                                                                  cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

                                                                                                  dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

                                                                                                  fisherz Convert a correlation to the corresponding Fisher z score

                                                                                                  geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

                                                                                                  ICC and cohenkappa are typically used to find the reliability for raters

                                                                                                  headtail combines the head and tail functions to show the first and last lines of a dataset or output

                                                                                                  topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

                                                                                                  mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

                                                                                                  prep finds the probability of replication for an F t or r and estimate effect size

                                                                                                  partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

                                                                                                  rangeCorrection will correct correlations for restriction of range

                                                                                                  reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

                                                                                                  49

                                                                                                  superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

                                                                                                  8 Data sets

                                                                                                  A number of data sets for demonstrating psychometric techniques are included in thepsych package These include six data sets showing a hierarchical factor structure (fivecognitive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2and one from health psychology Reise) One of these (Thurstone) is used as an examplein the sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) or 13 personality inventory scores (epibfi) and14 multiple choice iq items (iqitems) The vegetables example has paired comparisonpreferences for 9 vegetables This is an example of Thurstonian scaling used by Guilford(1954) and Nunnally (1967) Other data sets include cubits peas and heights fromGalton

                                                                                                  Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

                                                                                                  bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

                                                                                                  satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

                                                                                                  epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

                                                                                                  50

                                                                                                  iq 14 multiple choice ability items were included as part of the Synthetic Aperture Person-ality Assessment (SAPA) web based personality assessment project The data from1000 subjects are included here as a demonstration set for scoring multiple choiceinventories and doing basic item statistics

                                                                                                  galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to ntroduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

                                                                                                  Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

                                                                                                  miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

                                                                                                  9 Development version and a users guide

                                                                                                  The most recent development version is available as a source file at the repository main-tained at httppersonality-projectorgr That version will have removed the mostrecently discovered bugs (but perhaps introduced other yet to be discovered ones) Todownload that version go to the repository httppersonality-projectorgrsrc

                                                                                                  contrib and wander around For a Mac this version can be installed directly using theldquoother repositoryrdquo option in the package installer For a PC the zip file for the most recentrelease has been created using the win-builder facility at CRAN The development releasefor the Mac is usually several weeks ahead of the PC development version

                                                                                                  Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httppersonality-projectorgrpsych_manualpdf

                                                                                                  News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

                                                                                                  gt news(Version gt 170package=psych)

                                                                                                  51

                                                                                                  10 Psychometric Theory

                                                                                                  The psych package has been developed to help psychologists do basic research Many ofthe functions were developed to supplement a book (httppersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep)More information about the use of some of the functions may be found in the book

                                                                                                  For more extensive discussion of the use of psych in particular and R in general consulthttppersonality-projectorgrrguidehtml A short guide to R

                                                                                                  11 SessionInfo

                                                                                                  This document was prepared using the following settings

                                                                                                  gt sessionInfo()

                                                                                                  R Under development (unstable) (2017-03-05 r72309)

                                                                                                  Platform x86_64-apple-darwin1340 (64-bit)

                                                                                                  Running under macOS Sierra 10124

                                                                                                  Matrix products default

                                                                                                  BLAS LibraryFrameworksRframeworkVersions34ResourcesliblibRblas0dylib

                                                                                                  LAPACK LibraryFrameworksRframeworkVersions34ResourcesliblibRlapackdylib

                                                                                                  locale

                                                                                                  [1] C

                                                                                                  attached base packages

                                                                                                  [1] stats graphics grDevices utils datasets methods base

                                                                                                  other attached packages

                                                                                                  [1] psych_17421

                                                                                                  loaded via a namespace (and not attached)

                                                                                                  [1] compiler_340 parallel_340 tools_340 foreign_08-67

                                                                                                  [5] KernSmooth_223-15 nlme_31-131 mnormt_15-4 grid_340

                                                                                                  [9] lattice_020-34

                                                                                                  52

                                                                                                  References

                                                                                                  Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

                                                                                                  Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

                                                                                                  Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

                                                                                                  Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

                                                                                                  Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

                                                                                                  Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

                                                                                                  Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

                                                                                                  Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

                                                                                                  Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

                                                                                                  Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

                                                                                                  Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

                                                                                                  Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

                                                                                                  Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

                                                                                                  Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

                                                                                                  Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

                                                                                                  53

                                                                                                  Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

                                                                                                  Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

                                                                                                  Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

                                                                                                  Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

                                                                                                  Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

                                                                                                  Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

                                                                                                  Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

                                                                                                  Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

                                                                                                  Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

                                                                                                  Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

                                                                                                  MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

                                                                                                  Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

                                                                                                  McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

                                                                                                  Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

                                                                                                  Nunnally J C (1967) Psychometric theory McGraw-Hill New York

                                                                                                  54

                                                                                                  Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

                                                                                                  3rd edition

                                                                                                  Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

                                                                                                  Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

                                                                                                  Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

                                                                                                  Revelle W (2015) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 158

                                                                                                  Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

                                                                                                  Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

                                                                                                  Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

                                                                                                  Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

                                                                                                  Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook ofIndividual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

                                                                                                  Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

                                                                                                  Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

                                                                                                  Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

                                                                                                  Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bang

                                                                                                  55

                                                                                                  for the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

                                                                                                  Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

                                                                                                  Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

                                                                                                  Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

                                                                                                  Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

                                                                                                  Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

                                                                                                  Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

                                                                                                  Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

                                                                                                  Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

                                                                                                  Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

                                                                                                  Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

                                                                                                  Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

                                                                                                  56

                                                                                                  Index

                                                                                                  affect 14 24alpha 5 6

                                                                                                  Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26bifactor 6biserial 13 34blockrandom 49burt 34

                                                                                                  char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49circular statistics 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49ctv 7cubits 50

                                                                                                  densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49diagram 8drawcor 34drawtetra 34dummycode 13

                                                                                                  dynamite plot 19

                                                                                                  edit 3epibfi 50error bars 19errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23

                                                                                                  fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factor analysis 6factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49

                                                                                                  galton 51generalized least squares 6geometricmean 49GPArotation 7guttman 6

                                                                                                  harmonicmean 49head 49headtail 49heights 50hetdiagram 7Hmisc 28Holzinger 50

                                                                                                  57

                                                                                                  ICC 6 49iclust 6iclustdiagram 7Index 49introduction to psychometric theory with ap-

                                                                                                  plications in R 7iqitems 50irtfa 6 47irt2latex 47 49

                                                                                                  KnitR 47

                                                                                                  lavaan 38library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27lowess 14

                                                                                                  makekeys 14MAP 6mardia 49maximum likelihood 6mediate 4 41 42mediatediagram 41minimum residual 6mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevel 37multilevelreliability 6multiple regression 38

                                                                                                  nfactors 6nlme 37

                                                                                                  omega 6 7outlier 3 11 12

                                                                                                  padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7principal axis 6psych 3 5ndash8 28 47 49ndash52

                                                                                                  R functionaffect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50

                                                                                                  58

                                                                                                  densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13edit 3epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49head 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47

                                                                                                  irt2latex 47 49library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7psych 51psych package

                                                                                                  affect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26

                                                                                                  59

                                                                                                  biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49corsmooth 34cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7

                                                                                                  fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47irt2latex 47 49lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12prep 49pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34

                                                                                                  60

                                                                                                  polyserial 34principal 5ndash7psych 51rtest 28rangeCorrection 49readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

                                                                                                  rtest 28

                                                                                                  rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50table 19tail 49tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

                                                                                                  R package

                                                                                                  61

                                                                                                  ctv 7GPArotation 7Hmisc 28KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash8 28 47 49ndash52Rgraphviz 8sem 7 50stats 28Sweave 47xtable 47

                                                                                                  rtest 28rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49Rgraphviz 8

                                                                                                  SAPA 26 50 51satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43sem 7 50set correlation 44setcor 49setCor 4 38 41 44 46 47simmultilevel 37

                                                                                                  spider 13stars 14stats 28StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50Sweave 47

                                                                                                  table 19tail 49tetrachoric 6 34Thurstone 28 38 50Thurstone33 50topBottom 49

                                                                                                  vegetables 50 51violinBy 14 18vss 5 6

                                                                                                  weighted least squares 6withinBetween 37

                                                                                                  xtable 47

                                                                                                  62

                                                                                                  • Jump starting the psych packagendasha guide for the impatient
                                                                                                  • Psychometric functions are summarized in the second vignette
                                                                                                  • Overview of this and related documents
                                                                                                  • Getting started
                                                                                                  • Basic data analysis
                                                                                                    • Getting the data by using readfile
                                                                                                    • Data input from the clipboard
                                                                                                    • Basic descriptive statistics
                                                                                                      • Outlier detection using outlier
                                                                                                      • Basic data cleaning using scrub
                                                                                                      • Recoding categorical variables into dummy coded variables
                                                                                                        • Simple descriptive graphics
                                                                                                          • Scatter Plot Matrices
                                                                                                          • Density or violin plots
                                                                                                          • Means and error bars
                                                                                                          • Error bars for tabular data
                                                                                                          • Two dimensional displays of means and errors
                                                                                                          • Back to back histograms
                                                                                                          • Correlational structure
                                                                                                          • Heatmap displays of correlational structure
                                                                                                            • Testing correlations
                                                                                                            • Polychoric tetrachoric polyserial and biserial correlations
                                                                                                              • Multilevel modeling
                                                                                                                • Decomposing data into within and between level correlations using statsBy
                                                                                                                • Generating and displaying multilevel data
                                                                                                                • Factor analysis by groups
                                                                                                                  • Multiple Regression mediation moderation and set correlations
                                                                                                                    • Multiple regression from data or correlation matrices
                                                                                                                    • Mediation and Moderation analysis
                                                                                                                    • Set Correlation
                                                                                                                      • Converting output to APA style tables using LaTeX
                                                                                                                      • Miscellaneous functions
                                                                                                                      • Data sets
                                                                                                                      • Development version and a users guide
                                                                                                                      • Psychometric Theory
                                                                                                                      • SessionInfo

                                                                                                    superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

                                                                                                    8 Data sets

                                                                                                    A number of data sets for demonstrating psychometric techniques are included in thepsych package These include six data sets showing a hierarchical factor structure (fivecognitive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2and one from health psychology Reise) One of these (Thurstone) is used as an examplein the sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) or 13 personality inventory scores (epibfi) and14 multiple choice iq items (iqitems) The vegetables example has paired comparisonpreferences for 9 vegetables This is an example of Thurstonian scaling used by Guilford(1954) and Nunnally (1967) Other data sets include cubits peas and heights fromGalton

                                                                                                    Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

                                                                                                    bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

                                                                                                    satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

                                                                                                    epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

                                                                                                    50

                                                                                                    iq 14 multiple choice ability items were included as part of the Synthetic Aperture Person-ality Assessment (SAPA) web based personality assessment project The data from1000 subjects are included here as a demonstration set for scoring multiple choiceinventories and doing basic item statistics

                                                                                                    galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to ntroduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

                                                                                                    Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

                                                                                                    miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

                                                                                                    9 Development version and a users guide

                                                                                                    The most recent development version is available as a source file at the repository main-tained at httppersonality-projectorgr That version will have removed the mostrecently discovered bugs (but perhaps introduced other yet to be discovered ones) Todownload that version go to the repository httppersonality-projectorgrsrc

                                                                                                    contrib and wander around For a Mac this version can be installed directly using theldquoother repositoryrdquo option in the package installer For a PC the zip file for the most recentrelease has been created using the win-builder facility at CRAN The development releasefor the Mac is usually several weeks ahead of the PC development version

                                                                                                    Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httppersonality-projectorgrpsych_manualpdf

                                                                                                    News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

                                                                                                    gt news(Version gt 170package=psych)

                                                                                                    51

                                                                                                    10 Psychometric Theory

                                                                                                    The psych package has been developed to help psychologists do basic research Many ofthe functions were developed to supplement a book (httppersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep)More information about the use of some of the functions may be found in the book

                                                                                                    For more extensive discussion of the use of psych in particular and R in general consulthttppersonality-projectorgrrguidehtml A short guide to R

                                                                                                    11 SessionInfo

                                                                                                    This document was prepared using the following settings

                                                                                                    gt sessionInfo()

                                                                                                    R Under development (unstable) (2017-03-05 r72309)

                                                                                                    Platform x86_64-apple-darwin1340 (64-bit)

                                                                                                    Running under macOS Sierra 10124

                                                                                                    Matrix products default

                                                                                                    BLAS LibraryFrameworksRframeworkVersions34ResourcesliblibRblas0dylib

                                                                                                    LAPACK LibraryFrameworksRframeworkVersions34ResourcesliblibRlapackdylib

                                                                                                    locale

                                                                                                    [1] C

                                                                                                    attached base packages

                                                                                                    [1] stats graphics grDevices utils datasets methods base

                                                                                                    other attached packages

                                                                                                    [1] psych_17421

                                                                                                    loaded via a namespace (and not attached)

                                                                                                    [1] compiler_340 parallel_340 tools_340 foreign_08-67

                                                                                                    [5] KernSmooth_223-15 nlme_31-131 mnormt_15-4 grid_340

                                                                                                    [9] lattice_020-34

                                                                                                    52

                                                                                                    References

                                                                                                    Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

                                                                                                    Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

                                                                                                    Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

                                                                                                    Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

                                                                                                    Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

                                                                                                    Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

                                                                                                    Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

                                                                                                    Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

                                                                                                    Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

                                                                                                    Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

                                                                                                    Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

                                                                                                    Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

                                                                                                    Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

                                                                                                    Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

                                                                                                    Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

                                                                                                    53

                                                                                                    Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

                                                                                                    Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

                                                                                                    Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

                                                                                                    Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

                                                                                                    Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

                                                                                                    Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

                                                                                                    Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

                                                                                                    Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

                                                                                                    Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

                                                                                                    Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

                                                                                                    MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

                                                                                                    Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

                                                                                                    McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

                                                                                                    Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

                                                                                                    Nunnally J C (1967) Psychometric theory McGraw-Hill New York

                                                                                                    54

                                                                                                    Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

                                                                                                    3rd edition

                                                                                                    Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

                                                                                                    Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

                                                                                                    Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

                                                                                                    Revelle W (2015) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 158

                                                                                                    Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

                                                                                                    Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

                                                                                                    Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

                                                                                                    Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

                                                                                                    Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook ofIndividual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

                                                                                                    Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

                                                                                                    Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

                                                                                                    Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

                                                                                                    Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bang

                                                                                                    55

                                                                                                    for the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

                                                                                                    Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

                                                                                                    Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

                                                                                                    Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

                                                                                                    Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

                                                                                                    Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

                                                                                                    Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

                                                                                                    Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

                                                                                                    Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

                                                                                                    Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

                                                                                                    Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

                                                                                                    Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

                                                                                                    56

                                                                                                    Index

                                                                                                    affect 14 24alpha 5 6

                                                                                                    Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26bifactor 6biserial 13 34blockrandom 49burt 34

                                                                                                    char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49circular statistics 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49ctv 7cubits 50

                                                                                                    densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49diagram 8drawcor 34drawtetra 34dummycode 13

                                                                                                    dynamite plot 19

                                                                                                    edit 3epibfi 50error bars 19errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23

                                                                                                    fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factor analysis 6factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49

                                                                                                    galton 51generalized least squares 6geometricmean 49GPArotation 7guttman 6

                                                                                                    harmonicmean 49head 49headtail 49heights 50hetdiagram 7Hmisc 28Holzinger 50

                                                                                                    57

                                                                                                    ICC 6 49iclust 6iclustdiagram 7Index 49introduction to psychometric theory with ap-

                                                                                                    plications in R 7iqitems 50irtfa 6 47irt2latex 47 49

                                                                                                    KnitR 47

                                                                                                    lavaan 38library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27lowess 14

                                                                                                    makekeys 14MAP 6mardia 49maximum likelihood 6mediate 4 41 42mediatediagram 41minimum residual 6mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevel 37multilevelreliability 6multiple regression 38

                                                                                                    nfactors 6nlme 37

                                                                                                    omega 6 7outlier 3 11 12

                                                                                                    padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7principal axis 6psych 3 5ndash8 28 47 49ndash52

                                                                                                    R functionaffect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50

                                                                                                    58

                                                                                                    densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13edit 3epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49head 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47

                                                                                                    irt2latex 47 49library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7psych 51psych package

                                                                                                    affect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26

                                                                                                    59

                                                                                                    biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49corsmooth 34cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7

                                                                                                    fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47irt2latex 47 49lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12prep 49pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34

                                                                                                    60

                                                                                                    polyserial 34principal 5ndash7psych 51rtest 28rangeCorrection 49readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

                                                                                                    rtest 28

                                                                                                    rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50table 19tail 49tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

                                                                                                    R package

                                                                                                    61

                                                                                                    ctv 7GPArotation 7Hmisc 28KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash8 28 47 49ndash52Rgraphviz 8sem 7 50stats 28Sweave 47xtable 47

                                                                                                    rtest 28rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49Rgraphviz 8

                                                                                                    SAPA 26 50 51satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43sem 7 50set correlation 44setcor 49setCor 4 38 41 44 46 47simmultilevel 37

                                                                                                    spider 13stars 14stats 28StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50Sweave 47

                                                                                                    table 19tail 49tetrachoric 6 34Thurstone 28 38 50Thurstone33 50topBottom 49

                                                                                                    vegetables 50 51violinBy 14 18vss 5 6

                                                                                                    weighted least squares 6withinBetween 37

                                                                                                    xtable 47

                                                                                                    62

                                                                                                    • Jump starting the psych packagendasha guide for the impatient
                                                                                                    • Psychometric functions are summarized in the second vignette
                                                                                                    • Overview of this and related documents
                                                                                                    • Getting started
                                                                                                    • Basic data analysis
                                                                                                      • Getting the data by using readfile
                                                                                                      • Data input from the clipboard
                                                                                                      • Basic descriptive statistics
                                                                                                        • Outlier detection using outlier
                                                                                                        • Basic data cleaning using scrub
                                                                                                        • Recoding categorical variables into dummy coded variables
                                                                                                          • Simple descriptive graphics
                                                                                                            • Scatter Plot Matrices
                                                                                                            • Density or violin plots
                                                                                                            • Means and error bars
                                                                                                            • Error bars for tabular data
                                                                                                            • Two dimensional displays of means and errors
                                                                                                            • Back to back histograms
                                                                                                            • Correlational structure
                                                                                                            • Heatmap displays of correlational structure
                                                                                                              • Testing correlations
                                                                                                              • Polychoric tetrachoric polyserial and biserial correlations
                                                                                                                • Multilevel modeling
                                                                                                                  • Decomposing data into within and between level correlations using statsBy
                                                                                                                  • Generating and displaying multilevel data
                                                                                                                  • Factor analysis by groups
                                                                                                                    • Multiple Regression mediation moderation and set correlations
                                                                                                                      • Multiple regression from data or correlation matrices
                                                                                                                      • Mediation and Moderation analysis
                                                                                                                      • Set Correlation
                                                                                                                        • Converting output to APA style tables using LaTeX
                                                                                                                        • Miscellaneous functions
                                                                                                                        • Data sets
                                                                                                                        • Development version and a users guide
                                                                                                                        • Psychometric Theory
                                                                                                                        • SessionInfo

                                                                                                      iq 14 multiple choice ability items were included as part of the Synthetic Aperture Person-ality Assessment (SAPA) web based personality assessment project The data from1000 subjects are included here as a demonstration set for scoring multiple choiceinventories and doing basic item statistics

                                                                                                      galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to ntroduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

                                                                                                      Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

                                                                                                      miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

                                                                                                      9 Development version and a users guide

                                                                                                      The most recent development version is available as a source file at the repository main-tained at httppersonality-projectorgr That version will have removed the mostrecently discovered bugs (but perhaps introduced other yet to be discovered ones) Todownload that version go to the repository httppersonality-projectorgrsrc

                                                                                                      contrib and wander around For a Mac this version can be installed directly using theldquoother repositoryrdquo option in the package installer For a PC the zip file for the most recentrelease has been created using the win-builder facility at CRAN The development releasefor the Mac is usually several weeks ahead of the PC development version

                                                                                                      Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httppersonality-projectorgrpsych_manualpdf

                                                                                                      News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

                                                                                                      gt news(Version gt 170package=psych)

                                                                                                      51

                                                                                                      10 Psychometric Theory

                                                                                                      The psych package has been developed to help psychologists do basic research Many ofthe functions were developed to supplement a book (httppersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep)More information about the use of some of the functions may be found in the book

                                                                                                      For more extensive discussion of the use of psych in particular and R in general consulthttppersonality-projectorgrrguidehtml A short guide to R

                                                                                                      11 SessionInfo

                                                                                                      This document was prepared using the following settings

                                                                                                      gt sessionInfo()

                                                                                                      R Under development (unstable) (2017-03-05 r72309)

                                                                                                      Platform x86_64-apple-darwin1340 (64-bit)

                                                                                                      Running under macOS Sierra 10124

                                                                                                      Matrix products default

                                                                                                      BLAS LibraryFrameworksRframeworkVersions34ResourcesliblibRblas0dylib

                                                                                                      LAPACK LibraryFrameworksRframeworkVersions34ResourcesliblibRlapackdylib

                                                                                                      locale

                                                                                                      [1] C

                                                                                                      attached base packages

                                                                                                      [1] stats graphics grDevices utils datasets methods base

                                                                                                      other attached packages

                                                                                                      [1] psych_17421

                                                                                                      loaded via a namespace (and not attached)

                                                                                                      [1] compiler_340 parallel_340 tools_340 foreign_08-67

                                                                                                      [5] KernSmooth_223-15 nlme_31-131 mnormt_15-4 grid_340

                                                                                                      [9] lattice_020-34

                                                                                                      52

                                                                                                      References

                                                                                                      Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

                                                                                                      Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

                                                                                                      Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

                                                                                                      Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

                                                                                                      Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

                                                                                                      Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

                                                                                                      Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

                                                                                                      Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

                                                                                                      Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

                                                                                                      Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

                                                                                                      Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

                                                                                                      Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

                                                                                                      Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

                                                                                                      Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

                                                                                                      Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

                                                                                                      53

                                                                                                      Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

                                                                                                      Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

                                                                                                      Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

                                                                                                      Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

                                                                                                      Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

                                                                                                      Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

                                                                                                      Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

                                                                                                      Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

                                                                                                      Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

                                                                                                      Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

                                                                                                      MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

                                                                                                      Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

                                                                                                      McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

                                                                                                      Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

                                                                                                      Nunnally J C (1967) Psychometric theory McGraw-Hill New York

                                                                                                      54

                                                                                                      Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

                                                                                                      3rd edition

                                                                                                      Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

                                                                                                      Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

                                                                                                      Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

                                                                                                      Revelle W (2015) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 158

                                                                                                      Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

                                                                                                      Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

                                                                                                      Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

                                                                                                      Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

                                                                                                      Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook ofIndividual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

                                                                                                      Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

                                                                                                      Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

                                                                                                      Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

                                                                                                      Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bang

                                                                                                      55

                                                                                                      for the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

                                                                                                      Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

                                                                                                      Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

                                                                                                      Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

                                                                                                      Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

                                                                                                      Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

                                                                                                      Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

                                                                                                      Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

                                                                                                      Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

                                                                                                      Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

                                                                                                      Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

                                                                                                      Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

                                                                                                      56

                                                                                                      Index

                                                                                                      affect 14 24alpha 5 6

                                                                                                      Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26bifactor 6biserial 13 34blockrandom 49burt 34

                                                                                                      char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49circular statistics 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49ctv 7cubits 50

                                                                                                      densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49diagram 8drawcor 34drawtetra 34dummycode 13

                                                                                                      dynamite plot 19

                                                                                                      edit 3epibfi 50error bars 19errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23

                                                                                                      fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factor analysis 6factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49

                                                                                                      galton 51generalized least squares 6geometricmean 49GPArotation 7guttman 6

                                                                                                      harmonicmean 49head 49headtail 49heights 50hetdiagram 7Hmisc 28Holzinger 50

                                                                                                      57

                                                                                                      ICC 6 49iclust 6iclustdiagram 7Index 49introduction to psychometric theory with ap-

                                                                                                      plications in R 7iqitems 50irtfa 6 47irt2latex 47 49

                                                                                                      KnitR 47

                                                                                                      lavaan 38library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27lowess 14

                                                                                                      makekeys 14MAP 6mardia 49maximum likelihood 6mediate 4 41 42mediatediagram 41minimum residual 6mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevel 37multilevelreliability 6multiple regression 38

                                                                                                      nfactors 6nlme 37

                                                                                                      omega 6 7outlier 3 11 12

                                                                                                      padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7principal axis 6psych 3 5ndash8 28 47 49ndash52

                                                                                                      R functionaffect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50

                                                                                                      58

                                                                                                      densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13edit 3epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49head 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47

                                                                                                      irt2latex 47 49library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7psych 51psych package

                                                                                                      affect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26

                                                                                                      59

                                                                                                      biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49corsmooth 34cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7

                                                                                                      fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47irt2latex 47 49lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12prep 49pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34

                                                                                                      60

                                                                                                      polyserial 34principal 5ndash7psych 51rtest 28rangeCorrection 49readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

                                                                                                      rtest 28

                                                                                                      rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50table 19tail 49tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

                                                                                                      R package

                                                                                                      61

                                                                                                      ctv 7GPArotation 7Hmisc 28KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash8 28 47 49ndash52Rgraphviz 8sem 7 50stats 28Sweave 47xtable 47

                                                                                                      rtest 28rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49Rgraphviz 8

                                                                                                      SAPA 26 50 51satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43sem 7 50set correlation 44setcor 49setCor 4 38 41 44 46 47simmultilevel 37

                                                                                                      spider 13stars 14stats 28StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50Sweave 47

                                                                                                      table 19tail 49tetrachoric 6 34Thurstone 28 38 50Thurstone33 50topBottom 49

                                                                                                      vegetables 50 51violinBy 14 18vss 5 6

                                                                                                      weighted least squares 6withinBetween 37

                                                                                                      xtable 47

                                                                                                      62

                                                                                                      • Jump starting the psych packagendasha guide for the impatient
                                                                                                      • Psychometric functions are summarized in the second vignette
                                                                                                      • Overview of this and related documents
                                                                                                      • Getting started
                                                                                                      • Basic data analysis
                                                                                                        • Getting the data by using readfile
                                                                                                        • Data input from the clipboard
                                                                                                        • Basic descriptive statistics
                                                                                                          • Outlier detection using outlier
                                                                                                          • Basic data cleaning using scrub
                                                                                                          • Recoding categorical variables into dummy coded variables
                                                                                                            • Simple descriptive graphics
                                                                                                              • Scatter Plot Matrices
                                                                                                              • Density or violin plots
                                                                                                              • Means and error bars
                                                                                                              • Error bars for tabular data
                                                                                                              • Two dimensional displays of means and errors
                                                                                                              • Back to back histograms
                                                                                                              • Correlational structure
                                                                                                              • Heatmap displays of correlational structure
                                                                                                                • Testing correlations
                                                                                                                • Polychoric tetrachoric polyserial and biserial correlations
                                                                                                                  • Multilevel modeling
                                                                                                                    • Decomposing data into within and between level correlations using statsBy
                                                                                                                    • Generating and displaying multilevel data
                                                                                                                    • Factor analysis by groups
                                                                                                                      • Multiple Regression mediation moderation and set correlations
                                                                                                                        • Multiple regression from data or correlation matrices
                                                                                                                        • Mediation and Moderation analysis
                                                                                                                        • Set Correlation
                                                                                                                          • Converting output to APA style tables using LaTeX
                                                                                                                          • Miscellaneous functions
                                                                                                                          • Data sets
                                                                                                                          • Development version and a users guide
                                                                                                                          • Psychometric Theory
                                                                                                                          • SessionInfo

                                                                                                        10 Psychometric Theory

                                                                                                        The psych package has been developed to help psychologists do basic research Many ofthe functions were developed to supplement a book (httppersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep)More information about the use of some of the functions may be found in the book

                                                                                                        For more extensive discussion of the use of psych in particular and R in general consulthttppersonality-projectorgrrguidehtml A short guide to R

                                                                                                        11 SessionInfo

                                                                                                        This document was prepared using the following settings

                                                                                                        gt sessionInfo()

                                                                                                        R Under development (unstable) (2017-03-05 r72309)

                                                                                                        Platform x86_64-apple-darwin1340 (64-bit)

                                                                                                        Running under macOS Sierra 10124

                                                                                                        Matrix products default

                                                                                                        BLAS LibraryFrameworksRframeworkVersions34ResourcesliblibRblas0dylib

                                                                                                        LAPACK LibraryFrameworksRframeworkVersions34ResourcesliblibRlapackdylib

                                                                                                        locale

                                                                                                        [1] C

                                                                                                        attached base packages

                                                                                                        [1] stats graphics grDevices utils datasets methods base

                                                                                                        other attached packages

                                                                                                        [1] psych_17421

                                                                                                        loaded via a namespace (and not attached)

                                                                                                        [1] compiler_340 parallel_340 tools_340 foreign_08-67

                                                                                                        [5] KernSmooth_223-15 nlme_31-131 mnormt_15-4 grid_340

                                                                                                        [9] lattice_020-34

                                                                                                        52

                                                                                                        References

                                                                                                        Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

                                                                                                        Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

                                                                                                        Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

                                                                                                        Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

                                                                                                        Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

                                                                                                        Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

                                                                                                        Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

                                                                                                        Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

                                                                                                        Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

                                                                                                        Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

                                                                                                        Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

                                                                                                        Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

                                                                                                        Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

                                                                                                        Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

                                                                                                        Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

                                                                                                        53

                                                                                                        Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

                                                                                                        Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

                                                                                                        Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

                                                                                                        Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

                                                                                                        Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

                                                                                                        Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

                                                                                                        Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

                                                                                                        Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

                                                                                                        Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

                                                                                                        Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

                                                                                                        MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

                                                                                                        Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

                                                                                                        McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

                                                                                                        Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

                                                                                                        Nunnally J C (1967) Psychometric theory McGraw-Hill New York

                                                                                                        54

                                                                                                        Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

                                                                                                        3rd edition

                                                                                                        Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

                                                                                                        Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

                                                                                                        Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

                                                                                                        Revelle W (2015) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 158

                                                                                                        Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

                                                                                                        Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

                                                                                                        Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

                                                                                                        Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

                                                                                                        Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook ofIndividual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

                                                                                                        Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

                                                                                                        Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

                                                                                                        Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

                                                                                                        Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bang

                                                                                                        55

                                                                                                        for the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

                                                                                                        Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

                                                                                                        Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

                                                                                                        Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

                                                                                                        Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

                                                                                                        Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

                                                                                                        Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

                                                                                                        Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

                                                                                                        Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

                                                                                                        Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

                                                                                                        Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

                                                                                                        Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

                                                                                                        56

                                                                                                        Index

                                                                                                        affect 14 24alpha 5 6

                                                                                                        Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26bifactor 6biserial 13 34blockrandom 49burt 34

                                                                                                        char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49circular statistics 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49ctv 7cubits 50

                                                                                                        densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49diagram 8drawcor 34drawtetra 34dummycode 13

                                                                                                        dynamite plot 19

                                                                                                        edit 3epibfi 50error bars 19errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23

                                                                                                        fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factor analysis 6factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49

                                                                                                        galton 51generalized least squares 6geometricmean 49GPArotation 7guttman 6

                                                                                                        harmonicmean 49head 49headtail 49heights 50hetdiagram 7Hmisc 28Holzinger 50

                                                                                                        57

                                                                                                        ICC 6 49iclust 6iclustdiagram 7Index 49introduction to psychometric theory with ap-

                                                                                                        plications in R 7iqitems 50irtfa 6 47irt2latex 47 49

                                                                                                        KnitR 47

                                                                                                        lavaan 38library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27lowess 14

                                                                                                        makekeys 14MAP 6mardia 49maximum likelihood 6mediate 4 41 42mediatediagram 41minimum residual 6mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevel 37multilevelreliability 6multiple regression 38

                                                                                                        nfactors 6nlme 37

                                                                                                        omega 6 7outlier 3 11 12

                                                                                                        padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7principal axis 6psych 3 5ndash8 28 47 49ndash52

                                                                                                        R functionaffect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50

                                                                                                        58

                                                                                                        densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13edit 3epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49head 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47

                                                                                                        irt2latex 47 49library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7psych 51psych package

                                                                                                        affect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26

                                                                                                        59

                                                                                                        biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49corsmooth 34cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7

                                                                                                        fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47irt2latex 47 49lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12prep 49pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34

                                                                                                        60

                                                                                                        polyserial 34principal 5ndash7psych 51rtest 28rangeCorrection 49readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

                                                                                                        rtest 28

                                                                                                        rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50table 19tail 49tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

                                                                                                        R package

                                                                                                        61

                                                                                                        ctv 7GPArotation 7Hmisc 28KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash8 28 47 49ndash52Rgraphviz 8sem 7 50stats 28Sweave 47xtable 47

                                                                                                        rtest 28rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49Rgraphviz 8

                                                                                                        SAPA 26 50 51satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43sem 7 50set correlation 44setcor 49setCor 4 38 41 44 46 47simmultilevel 37

                                                                                                        spider 13stars 14stats 28StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50Sweave 47

                                                                                                        table 19tail 49tetrachoric 6 34Thurstone 28 38 50Thurstone33 50topBottom 49

                                                                                                        vegetables 50 51violinBy 14 18vss 5 6

                                                                                                        weighted least squares 6withinBetween 37

                                                                                                        xtable 47

                                                                                                        62

                                                                                                        • Jump starting the psych packagendasha guide for the impatient
                                                                                                        • Psychometric functions are summarized in the second vignette
                                                                                                        • Overview of this and related documents
                                                                                                        • Getting started
                                                                                                        • Basic data analysis
                                                                                                          • Getting the data by using readfile
                                                                                                          • Data input from the clipboard
                                                                                                          • Basic descriptive statistics
                                                                                                            • Outlier detection using outlier
                                                                                                            • Basic data cleaning using scrub
                                                                                                            • Recoding categorical variables into dummy coded variables
                                                                                                              • Simple descriptive graphics
                                                                                                                • Scatter Plot Matrices
                                                                                                                • Density or violin plots
                                                                                                                • Means and error bars
                                                                                                                • Error bars for tabular data
                                                                                                                • Two dimensional displays of means and errors
                                                                                                                • Back to back histograms
                                                                                                                • Correlational structure
                                                                                                                • Heatmap displays of correlational structure
                                                                                                                  • Testing correlations
                                                                                                                  • Polychoric tetrachoric polyserial and biserial correlations
                                                                                                                    • Multilevel modeling
                                                                                                                      • Decomposing data into within and between level correlations using statsBy
                                                                                                                      • Generating and displaying multilevel data
                                                                                                                      • Factor analysis by groups
                                                                                                                        • Multiple Regression mediation moderation and set correlations
                                                                                                                          • Multiple regression from data or correlation matrices
                                                                                                                          • Mediation and Moderation analysis
                                                                                                                          • Set Correlation
                                                                                                                            • Converting output to APA style tables using LaTeX
                                                                                                                            • Miscellaneous functions
                                                                                                                            • Data sets
                                                                                                                            • Development version and a users guide
                                                                                                                            • Psychometric Theory
                                                                                                                            • SessionInfo

                                                                                                          References

                                                                                                          Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

                                                                                                          Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

                                                                                                          Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

                                                                                                          Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

                                                                                                          Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

                                                                                                          Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

                                                                                                          Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

                                                                                                          Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

                                                                                                          Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

                                                                                                          Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

                                                                                                          Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

                                                                                                          Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

                                                                                                          Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

                                                                                                          Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

                                                                                                          Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

                                                                                                          53

                                                                                                          Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

                                                                                                          Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

                                                                                                          Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

                                                                                                          Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

                                                                                                          Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

                                                                                                          Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

                                                                                                          Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

                                                                                                          Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

                                                                                                          Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

                                                                                                          Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

                                                                                                          MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

                                                                                                          Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

                                                                                                          McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

                                                                                                          Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

                                                                                                          Nunnally J C (1967) Psychometric theory McGraw-Hill New York

                                                                                                          54

                                                                                                          Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

                                                                                                          3rd edition

                                                                                                          Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

                                                                                                          Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

                                                                                                          Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

                                                                                                          Revelle W (2015) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 158

                                                                                                          Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

                                                                                                          Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

                                                                                                          Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

                                                                                                          Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

                                                                                                          Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook ofIndividual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

                                                                                                          Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

                                                                                                          Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

                                                                                                          Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

                                                                                                          Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bang

                                                                                                          55

                                                                                                          for the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

                                                                                                          Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

                                                                                                          Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

                                                                                                          Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

                                                                                                          Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

                                                                                                          Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

                                                                                                          Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

                                                                                                          Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

                                                                                                          Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

                                                                                                          Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

                                                                                                          Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

                                                                                                          Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

                                                                                                          56

                                                                                                          Index

                                                                                                          affect 14 24alpha 5 6

                                                                                                          Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26bifactor 6biserial 13 34blockrandom 49burt 34

                                                                                                          char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49circular statistics 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49ctv 7cubits 50

                                                                                                          densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49diagram 8drawcor 34drawtetra 34dummycode 13

                                                                                                          dynamite plot 19

                                                                                                          edit 3epibfi 50error bars 19errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23

                                                                                                          fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factor analysis 6factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49

                                                                                                          galton 51generalized least squares 6geometricmean 49GPArotation 7guttman 6

                                                                                                          harmonicmean 49head 49headtail 49heights 50hetdiagram 7Hmisc 28Holzinger 50

                                                                                                          57

                                                                                                          ICC 6 49iclust 6iclustdiagram 7Index 49introduction to psychometric theory with ap-

                                                                                                          plications in R 7iqitems 50irtfa 6 47irt2latex 47 49

                                                                                                          KnitR 47

                                                                                                          lavaan 38library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27lowess 14

                                                                                                          makekeys 14MAP 6mardia 49maximum likelihood 6mediate 4 41 42mediatediagram 41minimum residual 6mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevel 37multilevelreliability 6multiple regression 38

                                                                                                          nfactors 6nlme 37

                                                                                                          omega 6 7outlier 3 11 12

                                                                                                          padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7principal axis 6psych 3 5ndash8 28 47 49ndash52

                                                                                                          R functionaffect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50

                                                                                                          58

                                                                                                          densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13edit 3epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49head 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47

                                                                                                          irt2latex 47 49library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7psych 51psych package

                                                                                                          affect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26

                                                                                                          59

                                                                                                          biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49corsmooth 34cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7

                                                                                                          fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47irt2latex 47 49lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12prep 49pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34

                                                                                                          60

                                                                                                          polyserial 34principal 5ndash7psych 51rtest 28rangeCorrection 49readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

                                                                                                          rtest 28

                                                                                                          rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50table 19tail 49tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

                                                                                                          R package

                                                                                                          61

                                                                                                          ctv 7GPArotation 7Hmisc 28KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash8 28 47 49ndash52Rgraphviz 8sem 7 50stats 28Sweave 47xtable 47

                                                                                                          rtest 28rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49Rgraphviz 8

                                                                                                          SAPA 26 50 51satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43sem 7 50set correlation 44setcor 49setCor 4 38 41 44 46 47simmultilevel 37

                                                                                                          spider 13stars 14stats 28StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50Sweave 47

                                                                                                          table 19tail 49tetrachoric 6 34Thurstone 28 38 50Thurstone33 50topBottom 49

                                                                                                          vegetables 50 51violinBy 14 18vss 5 6

                                                                                                          weighted least squares 6withinBetween 37

                                                                                                          xtable 47

                                                                                                          62

                                                                                                          • Jump starting the psych packagendasha guide for the impatient
                                                                                                          • Psychometric functions are summarized in the second vignette
                                                                                                          • Overview of this and related documents
                                                                                                          • Getting started
                                                                                                          • Basic data analysis
                                                                                                            • Getting the data by using readfile
                                                                                                            • Data input from the clipboard
                                                                                                            • Basic descriptive statistics
                                                                                                              • Outlier detection using outlier
                                                                                                              • Basic data cleaning using scrub
                                                                                                              • Recoding categorical variables into dummy coded variables
                                                                                                                • Simple descriptive graphics
                                                                                                                  • Scatter Plot Matrices
                                                                                                                  • Density or violin plots
                                                                                                                  • Means and error bars
                                                                                                                  • Error bars for tabular data
                                                                                                                  • Two dimensional displays of means and errors
                                                                                                                  • Back to back histograms
                                                                                                                  • Correlational structure
                                                                                                                  • Heatmap displays of correlational structure
                                                                                                                    • Testing correlations
                                                                                                                    • Polychoric tetrachoric polyserial and biserial correlations
                                                                                                                      • Multilevel modeling
                                                                                                                        • Decomposing data into within and between level correlations using statsBy
                                                                                                                        • Generating and displaying multilevel data
                                                                                                                        • Factor analysis by groups
                                                                                                                          • Multiple Regression mediation moderation and set correlations
                                                                                                                            • Multiple regression from data or correlation matrices
                                                                                                                            • Mediation and Moderation analysis
                                                                                                                            • Set Correlation
                                                                                                                              • Converting output to APA style tables using LaTeX
                                                                                                                              • Miscellaneous functions
                                                                                                                              • Data sets
                                                                                                                              • Development version and a users guide
                                                                                                                              • Psychometric Theory
                                                                                                                              • SessionInfo

                                                                                                            Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

                                                                                                            Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

                                                                                                            Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

                                                                                                            Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

                                                                                                            Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

                                                                                                            Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

                                                                                                            Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

                                                                                                            Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

                                                                                                            Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

                                                                                                            Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

                                                                                                            MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

                                                                                                            Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

                                                                                                            McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

                                                                                                            Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

                                                                                                            Nunnally J C (1967) Psychometric theory McGraw-Hill New York

                                                                                                            54

                                                                                                            Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

                                                                                                            3rd edition

                                                                                                            Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

                                                                                                            Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

                                                                                                            Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

                                                                                                            Revelle W (2015) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 158

                                                                                                            Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

                                                                                                            Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

                                                                                                            Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

                                                                                                            Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

                                                                                                            Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook ofIndividual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

                                                                                                            Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

                                                                                                            Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

                                                                                                            Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

                                                                                                            Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bang

                                                                                                            55

                                                                                                            for the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

                                                                                                            Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

                                                                                                            Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

                                                                                                            Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

                                                                                                            Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

                                                                                                            Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

                                                                                                            Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

                                                                                                            Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

                                                                                                            Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

                                                                                                            Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

                                                                                                            Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

                                                                                                            Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

                                                                                                            56

                                                                                                            Index

                                                                                                            affect 14 24alpha 5 6

                                                                                                            Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26bifactor 6biserial 13 34blockrandom 49burt 34

                                                                                                            char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49circular statistics 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49ctv 7cubits 50

                                                                                                            densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49diagram 8drawcor 34drawtetra 34dummycode 13

                                                                                                            dynamite plot 19

                                                                                                            edit 3epibfi 50error bars 19errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23

                                                                                                            fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factor analysis 6factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49

                                                                                                            galton 51generalized least squares 6geometricmean 49GPArotation 7guttman 6

                                                                                                            harmonicmean 49head 49headtail 49heights 50hetdiagram 7Hmisc 28Holzinger 50

                                                                                                            57

                                                                                                            ICC 6 49iclust 6iclustdiagram 7Index 49introduction to psychometric theory with ap-

                                                                                                            plications in R 7iqitems 50irtfa 6 47irt2latex 47 49

                                                                                                            KnitR 47

                                                                                                            lavaan 38library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27lowess 14

                                                                                                            makekeys 14MAP 6mardia 49maximum likelihood 6mediate 4 41 42mediatediagram 41minimum residual 6mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevel 37multilevelreliability 6multiple regression 38

                                                                                                            nfactors 6nlme 37

                                                                                                            omega 6 7outlier 3 11 12

                                                                                                            padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7principal axis 6psych 3 5ndash8 28 47 49ndash52

                                                                                                            R functionaffect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50

                                                                                                            58

                                                                                                            densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13edit 3epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49head 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47

                                                                                                            irt2latex 47 49library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7psych 51psych package

                                                                                                            affect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26

                                                                                                            59

                                                                                                            biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49corsmooth 34cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7

                                                                                                            fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47irt2latex 47 49lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12prep 49pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34

                                                                                                            60

                                                                                                            polyserial 34principal 5ndash7psych 51rtest 28rangeCorrection 49readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

                                                                                                            rtest 28

                                                                                                            rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50table 19tail 49tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

                                                                                                            R package

                                                                                                            61

                                                                                                            ctv 7GPArotation 7Hmisc 28KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash8 28 47 49ndash52Rgraphviz 8sem 7 50stats 28Sweave 47xtable 47

                                                                                                            rtest 28rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49Rgraphviz 8

                                                                                                            SAPA 26 50 51satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43sem 7 50set correlation 44setcor 49setCor 4 38 41 44 46 47simmultilevel 37

                                                                                                            spider 13stars 14stats 28StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50Sweave 47

                                                                                                            table 19tail 49tetrachoric 6 34Thurstone 28 38 50Thurstone33 50topBottom 49

                                                                                                            vegetables 50 51violinBy 14 18vss 5 6

                                                                                                            weighted least squares 6withinBetween 37

                                                                                                            xtable 47

                                                                                                            62

                                                                                                            • Jump starting the psych packagendasha guide for the impatient
                                                                                                            • Psychometric functions are summarized in the second vignette
                                                                                                            • Overview of this and related documents
                                                                                                            • Getting started
                                                                                                            • Basic data analysis
                                                                                                              • Getting the data by using readfile
                                                                                                              • Data input from the clipboard
                                                                                                              • Basic descriptive statistics
                                                                                                                • Outlier detection using outlier
                                                                                                                • Basic data cleaning using scrub
                                                                                                                • Recoding categorical variables into dummy coded variables
                                                                                                                  • Simple descriptive graphics
                                                                                                                    • Scatter Plot Matrices
                                                                                                                    • Density or violin plots
                                                                                                                    • Means and error bars
                                                                                                                    • Error bars for tabular data
                                                                                                                    • Two dimensional displays of means and errors
                                                                                                                    • Back to back histograms
                                                                                                                    • Correlational structure
                                                                                                                    • Heatmap displays of correlational structure
                                                                                                                      • Testing correlations
                                                                                                                      • Polychoric tetrachoric polyserial and biserial correlations
                                                                                                                        • Multilevel modeling
                                                                                                                          • Decomposing data into within and between level correlations using statsBy
                                                                                                                          • Generating and displaying multilevel data
                                                                                                                          • Factor analysis by groups
                                                                                                                            • Multiple Regression mediation moderation and set correlations
                                                                                                                              • Multiple regression from data or correlation matrices
                                                                                                                              • Mediation and Moderation analysis
                                                                                                                              • Set Correlation
                                                                                                                                • Converting output to APA style tables using LaTeX
                                                                                                                                • Miscellaneous functions
                                                                                                                                • Data sets
                                                                                                                                • Development version and a users guide
                                                                                                                                • Psychometric Theory
                                                                                                                                • SessionInfo

                                                                                                              Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

                                                                                                              3rd edition

                                                                                                              Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

                                                                                                              Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

                                                                                                              Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

                                                                                                              Revelle W (2015) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 158

                                                                                                              Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

                                                                                                              Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

                                                                                                              Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

                                                                                                              Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

                                                                                                              Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook ofIndividual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

                                                                                                              Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

                                                                                                              Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

                                                                                                              Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

                                                                                                              Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bang

                                                                                                              55

                                                                                                              for the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

                                                                                                              Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

                                                                                                              Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

                                                                                                              Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

                                                                                                              Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

                                                                                                              Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

                                                                                                              Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

                                                                                                              Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

                                                                                                              Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

                                                                                                              Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

                                                                                                              Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

                                                                                                              Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

                                                                                                              56

                                                                                                              Index

                                                                                                              affect 14 24alpha 5 6

                                                                                                              Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26bifactor 6biserial 13 34blockrandom 49burt 34

                                                                                                              char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49circular statistics 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49ctv 7cubits 50

                                                                                                              densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49diagram 8drawcor 34drawtetra 34dummycode 13

                                                                                                              dynamite plot 19

                                                                                                              edit 3epibfi 50error bars 19errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23

                                                                                                              fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factor analysis 6factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49

                                                                                                              galton 51generalized least squares 6geometricmean 49GPArotation 7guttman 6

                                                                                                              harmonicmean 49head 49headtail 49heights 50hetdiagram 7Hmisc 28Holzinger 50

                                                                                                              57

                                                                                                              ICC 6 49iclust 6iclustdiagram 7Index 49introduction to psychometric theory with ap-

                                                                                                              plications in R 7iqitems 50irtfa 6 47irt2latex 47 49

                                                                                                              KnitR 47

                                                                                                              lavaan 38library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27lowess 14

                                                                                                              makekeys 14MAP 6mardia 49maximum likelihood 6mediate 4 41 42mediatediagram 41minimum residual 6mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevel 37multilevelreliability 6multiple regression 38

                                                                                                              nfactors 6nlme 37

                                                                                                              omega 6 7outlier 3 11 12

                                                                                                              padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7principal axis 6psych 3 5ndash8 28 47 49ndash52

                                                                                                              R functionaffect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50

                                                                                                              58

                                                                                                              densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13edit 3epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49head 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47

                                                                                                              irt2latex 47 49library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7psych 51psych package

                                                                                                              affect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26

                                                                                                              59

                                                                                                              biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49corsmooth 34cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7

                                                                                                              fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47irt2latex 47 49lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12prep 49pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34

                                                                                                              60

                                                                                                              polyserial 34principal 5ndash7psych 51rtest 28rangeCorrection 49readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

                                                                                                              rtest 28

                                                                                                              rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50table 19tail 49tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

                                                                                                              R package

                                                                                                              61

                                                                                                              ctv 7GPArotation 7Hmisc 28KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash8 28 47 49ndash52Rgraphviz 8sem 7 50stats 28Sweave 47xtable 47

                                                                                                              rtest 28rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49Rgraphviz 8

                                                                                                              SAPA 26 50 51satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43sem 7 50set correlation 44setcor 49setCor 4 38 41 44 46 47simmultilevel 37

                                                                                                              spider 13stars 14stats 28StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50Sweave 47

                                                                                                              table 19tail 49tetrachoric 6 34Thurstone 28 38 50Thurstone33 50topBottom 49

                                                                                                              vegetables 50 51violinBy 14 18vss 5 6

                                                                                                              weighted least squares 6withinBetween 37

                                                                                                              xtable 47

                                                                                                              62

                                                                                                              • Jump starting the psych packagendasha guide for the impatient
                                                                                                              • Psychometric functions are summarized in the second vignette
                                                                                                              • Overview of this and related documents
                                                                                                              • Getting started
                                                                                                              • Basic data analysis
                                                                                                                • Getting the data by using readfile
                                                                                                                • Data input from the clipboard
                                                                                                                • Basic descriptive statistics
                                                                                                                  • Outlier detection using outlier
                                                                                                                  • Basic data cleaning using scrub
                                                                                                                  • Recoding categorical variables into dummy coded variables
                                                                                                                    • Simple descriptive graphics
                                                                                                                      • Scatter Plot Matrices
                                                                                                                      • Density or violin plots
                                                                                                                      • Means and error bars
                                                                                                                      • Error bars for tabular data
                                                                                                                      • Two dimensional displays of means and errors
                                                                                                                      • Back to back histograms
                                                                                                                      • Correlational structure
                                                                                                                      • Heatmap displays of correlational structure
                                                                                                                        • Testing correlations
                                                                                                                        • Polychoric tetrachoric polyserial and biserial correlations
                                                                                                                          • Multilevel modeling
                                                                                                                            • Decomposing data into within and between level correlations using statsBy
                                                                                                                            • Generating and displaying multilevel data
                                                                                                                            • Factor analysis by groups
                                                                                                                              • Multiple Regression mediation moderation and set correlations
                                                                                                                                • Multiple regression from data or correlation matrices
                                                                                                                                • Mediation and Moderation analysis
                                                                                                                                • Set Correlation
                                                                                                                                  • Converting output to APA style tables using LaTeX
                                                                                                                                  • Miscellaneous functions
                                                                                                                                  • Data sets
                                                                                                                                  • Development version and a users guide
                                                                                                                                  • Psychometric Theory
                                                                                                                                  • SessionInfo

                                                                                                                for the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

                                                                                                                Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

                                                                                                                Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

                                                                                                                Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

                                                                                                                Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

                                                                                                                Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

                                                                                                                Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

                                                                                                                Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

                                                                                                                Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

                                                                                                                Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

                                                                                                                Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

                                                                                                                Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

                                                                                                                56

                                                                                                                Index

                                                                                                                affect 14 24alpha 5 6

                                                                                                                Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26bifactor 6biserial 13 34blockrandom 49burt 34

                                                                                                                char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49circular statistics 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49ctv 7cubits 50

                                                                                                                densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49diagram 8drawcor 34drawtetra 34dummycode 13

                                                                                                                dynamite plot 19

                                                                                                                edit 3epibfi 50error bars 19errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23

                                                                                                                fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factor analysis 6factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49

                                                                                                                galton 51generalized least squares 6geometricmean 49GPArotation 7guttman 6

                                                                                                                harmonicmean 49head 49headtail 49heights 50hetdiagram 7Hmisc 28Holzinger 50

                                                                                                                57

                                                                                                                ICC 6 49iclust 6iclustdiagram 7Index 49introduction to psychometric theory with ap-

                                                                                                                plications in R 7iqitems 50irtfa 6 47irt2latex 47 49

                                                                                                                KnitR 47

                                                                                                                lavaan 38library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27lowess 14

                                                                                                                makekeys 14MAP 6mardia 49maximum likelihood 6mediate 4 41 42mediatediagram 41minimum residual 6mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevel 37multilevelreliability 6multiple regression 38

                                                                                                                nfactors 6nlme 37

                                                                                                                omega 6 7outlier 3 11 12

                                                                                                                padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7principal axis 6psych 3 5ndash8 28 47 49ndash52

                                                                                                                R functionaffect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50

                                                                                                                58

                                                                                                                densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13edit 3epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49head 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47

                                                                                                                irt2latex 47 49library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7psych 51psych package

                                                                                                                affect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26

                                                                                                                59

                                                                                                                biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49corsmooth 34cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7

                                                                                                                fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47irt2latex 47 49lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12prep 49pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34

                                                                                                                60

                                                                                                                polyserial 34principal 5ndash7psych 51rtest 28rangeCorrection 49readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

                                                                                                                rtest 28

                                                                                                                rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50table 19tail 49tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

                                                                                                                R package

                                                                                                                61

                                                                                                                ctv 7GPArotation 7Hmisc 28KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash8 28 47 49ndash52Rgraphviz 8sem 7 50stats 28Sweave 47xtable 47

                                                                                                                rtest 28rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49Rgraphviz 8

                                                                                                                SAPA 26 50 51satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43sem 7 50set correlation 44setcor 49setCor 4 38 41 44 46 47simmultilevel 37

                                                                                                                spider 13stars 14stats 28StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50Sweave 47

                                                                                                                table 19tail 49tetrachoric 6 34Thurstone 28 38 50Thurstone33 50topBottom 49

                                                                                                                vegetables 50 51violinBy 14 18vss 5 6

                                                                                                                weighted least squares 6withinBetween 37

                                                                                                                xtable 47

                                                                                                                62

                                                                                                                • Jump starting the psych packagendasha guide for the impatient
                                                                                                                • Psychometric functions are summarized in the second vignette
                                                                                                                • Overview of this and related documents
                                                                                                                • Getting started
                                                                                                                • Basic data analysis
                                                                                                                  • Getting the data by using readfile
                                                                                                                  • Data input from the clipboard
                                                                                                                  • Basic descriptive statistics
                                                                                                                    • Outlier detection using outlier
                                                                                                                    • Basic data cleaning using scrub
                                                                                                                    • Recoding categorical variables into dummy coded variables
                                                                                                                      • Simple descriptive graphics
                                                                                                                        • Scatter Plot Matrices
                                                                                                                        • Density or violin plots
                                                                                                                        • Means and error bars
                                                                                                                        • Error bars for tabular data
                                                                                                                        • Two dimensional displays of means and errors
                                                                                                                        • Back to back histograms
                                                                                                                        • Correlational structure
                                                                                                                        • Heatmap displays of correlational structure
                                                                                                                          • Testing correlations
                                                                                                                          • Polychoric tetrachoric polyserial and biserial correlations
                                                                                                                            • Multilevel modeling
                                                                                                                              • Decomposing data into within and between level correlations using statsBy
                                                                                                                              • Generating and displaying multilevel data
                                                                                                                              • Factor analysis by groups
                                                                                                                                • Multiple Regression mediation moderation and set correlations
                                                                                                                                  • Multiple regression from data or correlation matrices
                                                                                                                                  • Mediation and Moderation analysis
                                                                                                                                  • Set Correlation
                                                                                                                                    • Converting output to APA style tables using LaTeX
                                                                                                                                    • Miscellaneous functions
                                                                                                                                    • Data sets
                                                                                                                                    • Development version and a users guide
                                                                                                                                    • Psychometric Theory
                                                                                                                                    • SessionInfo

                                                                                                                  Index

                                                                                                                  affect 14 24alpha 5 6

                                                                                                                  Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26bifactor 6biserial 13 34blockrandom 49burt 34

                                                                                                                  char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49circular statistics 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49ctv 7cubits 50

                                                                                                                  densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49diagram 8drawcor 34drawtetra 34dummycode 13

                                                                                                                  dynamite plot 19

                                                                                                                  edit 3epibfi 50error bars 19errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23

                                                                                                                  fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factor analysis 6factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49

                                                                                                                  galton 51generalized least squares 6geometricmean 49GPArotation 7guttman 6

                                                                                                                  harmonicmean 49head 49headtail 49heights 50hetdiagram 7Hmisc 28Holzinger 50

                                                                                                                  57

                                                                                                                  ICC 6 49iclust 6iclustdiagram 7Index 49introduction to psychometric theory with ap-

                                                                                                                  plications in R 7iqitems 50irtfa 6 47irt2latex 47 49

                                                                                                                  KnitR 47

                                                                                                                  lavaan 38library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27lowess 14

                                                                                                                  makekeys 14MAP 6mardia 49maximum likelihood 6mediate 4 41 42mediatediagram 41minimum residual 6mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevel 37multilevelreliability 6multiple regression 38

                                                                                                                  nfactors 6nlme 37

                                                                                                                  omega 6 7outlier 3 11 12

                                                                                                                  padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7principal axis 6psych 3 5ndash8 28 47 49ndash52

                                                                                                                  R functionaffect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50

                                                                                                                  58

                                                                                                                  densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13edit 3epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49head 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47

                                                                                                                  irt2latex 47 49library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7psych 51psych package

                                                                                                                  affect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26

                                                                                                                  59

                                                                                                                  biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49corsmooth 34cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7

                                                                                                                  fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47irt2latex 47 49lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12prep 49pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34

                                                                                                                  60

                                                                                                                  polyserial 34principal 5ndash7psych 51rtest 28rangeCorrection 49readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

                                                                                                                  rtest 28

                                                                                                                  rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50table 19tail 49tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

                                                                                                                  R package

                                                                                                                  61

                                                                                                                  ctv 7GPArotation 7Hmisc 28KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash8 28 47 49ndash52Rgraphviz 8sem 7 50stats 28Sweave 47xtable 47

                                                                                                                  rtest 28rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49Rgraphviz 8

                                                                                                                  SAPA 26 50 51satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43sem 7 50set correlation 44setcor 49setCor 4 38 41 44 46 47simmultilevel 37

                                                                                                                  spider 13stars 14stats 28StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50Sweave 47

                                                                                                                  table 19tail 49tetrachoric 6 34Thurstone 28 38 50Thurstone33 50topBottom 49

                                                                                                                  vegetables 50 51violinBy 14 18vss 5 6

                                                                                                                  weighted least squares 6withinBetween 37

                                                                                                                  xtable 47

                                                                                                                  62

                                                                                                                  • Jump starting the psych packagendasha guide for the impatient
                                                                                                                  • Psychometric functions are summarized in the second vignette
                                                                                                                  • Overview of this and related documents
                                                                                                                  • Getting started
                                                                                                                  • Basic data analysis
                                                                                                                    • Getting the data by using readfile
                                                                                                                    • Data input from the clipboard
                                                                                                                    • Basic descriptive statistics
                                                                                                                      • Outlier detection using outlier
                                                                                                                      • Basic data cleaning using scrub
                                                                                                                      • Recoding categorical variables into dummy coded variables
                                                                                                                        • Simple descriptive graphics
                                                                                                                          • Scatter Plot Matrices
                                                                                                                          • Density or violin plots
                                                                                                                          • Means and error bars
                                                                                                                          • Error bars for tabular data
                                                                                                                          • Two dimensional displays of means and errors
                                                                                                                          • Back to back histograms
                                                                                                                          • Correlational structure
                                                                                                                          • Heatmap displays of correlational structure
                                                                                                                            • Testing correlations
                                                                                                                            • Polychoric tetrachoric polyserial and biserial correlations
                                                                                                                              • Multilevel modeling
                                                                                                                                • Decomposing data into within and between level correlations using statsBy
                                                                                                                                • Generating and displaying multilevel data
                                                                                                                                • Factor analysis by groups
                                                                                                                                  • Multiple Regression mediation moderation and set correlations
                                                                                                                                    • Multiple regression from data or correlation matrices
                                                                                                                                    • Mediation and Moderation analysis
                                                                                                                                    • Set Correlation
                                                                                                                                      • Converting output to APA style tables using LaTeX
                                                                                                                                      • Miscellaneous functions
                                                                                                                                      • Data sets
                                                                                                                                      • Development version and a users guide
                                                                                                                                      • Psychometric Theory
                                                                                                                                      • SessionInfo

                                                                                                                    ICC 6 49iclust 6iclustdiagram 7Index 49introduction to psychometric theory with ap-

                                                                                                                    plications in R 7iqitems 50irtfa 6 47irt2latex 47 49

                                                                                                                    KnitR 47

                                                                                                                    lavaan 38library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27lowess 14

                                                                                                                    makekeys 14MAP 6mardia 49maximum likelihood 6mediate 4 41 42mediatediagram 41minimum residual 6mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevel 37multilevelreliability 6multiple regression 38

                                                                                                                    nfactors 6nlme 37

                                                                                                                    omega 6 7outlier 3 11 12

                                                                                                                    padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7principal axis 6psych 3 5ndash8 28 47 49ndash52

                                                                                                                    R functionaffect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50

                                                                                                                    58

                                                                                                                    densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13edit 3epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49head 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47

                                                                                                                    irt2latex 47 49library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7psych 51psych package

                                                                                                                    affect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26

                                                                                                                    59

                                                                                                                    biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49corsmooth 34cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7

                                                                                                                    fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47irt2latex 47 49lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12prep 49pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34

                                                                                                                    60

                                                                                                                    polyserial 34principal 5ndash7psych 51rtest 28rangeCorrection 49readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

                                                                                                                    rtest 28

                                                                                                                    rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50table 19tail 49tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

                                                                                                                    R package

                                                                                                                    61

                                                                                                                    ctv 7GPArotation 7Hmisc 28KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash8 28 47 49ndash52Rgraphviz 8sem 7 50stats 28Sweave 47xtable 47

                                                                                                                    rtest 28rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49Rgraphviz 8

                                                                                                                    SAPA 26 50 51satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43sem 7 50set correlation 44setcor 49setCor 4 38 41 44 46 47simmultilevel 37

                                                                                                                    spider 13stars 14stats 28StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50Sweave 47

                                                                                                                    table 19tail 49tetrachoric 6 34Thurstone 28 38 50Thurstone33 50topBottom 49

                                                                                                                    vegetables 50 51violinBy 14 18vss 5 6

                                                                                                                    weighted least squares 6withinBetween 37

                                                                                                                    xtable 47

                                                                                                                    62

                                                                                                                    • Jump starting the psych packagendasha guide for the impatient
                                                                                                                    • Psychometric functions are summarized in the second vignette
                                                                                                                    • Overview of this and related documents
                                                                                                                    • Getting started
                                                                                                                    • Basic data analysis
                                                                                                                      • Getting the data by using readfile
                                                                                                                      • Data input from the clipboard
                                                                                                                      • Basic descriptive statistics
                                                                                                                        • Outlier detection using outlier
                                                                                                                        • Basic data cleaning using scrub
                                                                                                                        • Recoding categorical variables into dummy coded variables
                                                                                                                          • Simple descriptive graphics
                                                                                                                            • Scatter Plot Matrices
                                                                                                                            • Density or violin plots
                                                                                                                            • Means and error bars
                                                                                                                            • Error bars for tabular data
                                                                                                                            • Two dimensional displays of means and errors
                                                                                                                            • Back to back histograms
                                                                                                                            • Correlational structure
                                                                                                                            • Heatmap displays of correlational structure
                                                                                                                              • Testing correlations
                                                                                                                              • Polychoric tetrachoric polyserial and biserial correlations
                                                                                                                                • Multilevel modeling
                                                                                                                                  • Decomposing data into within and between level correlations using statsBy
                                                                                                                                  • Generating and displaying multilevel data
                                                                                                                                  • Factor analysis by groups
                                                                                                                                    • Multiple Regression mediation moderation and set correlations
                                                                                                                                      • Multiple regression from data or correlation matrices
                                                                                                                                      • Mediation and Moderation analysis
                                                                                                                                      • Set Correlation
                                                                                                                                        • Converting output to APA style tables using LaTeX
                                                                                                                                        • Miscellaneous functions
                                                                                                                                        • Data sets
                                                                                                                                        • Development version and a users guide
                                                                                                                                        • Psychometric Theory
                                                                                                                                        • SessionInfo

                                                                                                                      densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13edit 3epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49head 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47

                                                                                                                      irt2latex 47 49library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7psych 51psych package

                                                                                                                      affect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26

                                                                                                                      59

                                                                                                                      biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49corsmooth 34cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7

                                                                                                                      fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47irt2latex 47 49lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12prep 49pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34

                                                                                                                      60

                                                                                                                      polyserial 34principal 5ndash7psych 51rtest 28rangeCorrection 49readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

                                                                                                                      rtest 28

                                                                                                                      rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50table 19tail 49tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

                                                                                                                      R package

                                                                                                                      61

                                                                                                                      ctv 7GPArotation 7Hmisc 28KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash8 28 47 49ndash52Rgraphviz 8sem 7 50stats 28Sweave 47xtable 47

                                                                                                                      rtest 28rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49Rgraphviz 8

                                                                                                                      SAPA 26 50 51satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43sem 7 50set correlation 44setcor 49setCor 4 38 41 44 46 47simmultilevel 37

                                                                                                                      spider 13stars 14stats 28StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50Sweave 47

                                                                                                                      table 19tail 49tetrachoric 6 34Thurstone 28 38 50Thurstone33 50topBottom 49

                                                                                                                      vegetables 50 51violinBy 14 18vss 5 6

                                                                                                                      weighted least squares 6withinBetween 37

                                                                                                                      xtable 47

                                                                                                                      62

                                                                                                                      • Jump starting the psych packagendasha guide for the impatient
                                                                                                                      • Psychometric functions are summarized in the second vignette
                                                                                                                      • Overview of this and related documents
                                                                                                                      • Getting started
                                                                                                                      • Basic data analysis
                                                                                                                        • Getting the data by using readfile
                                                                                                                        • Data input from the clipboard
                                                                                                                        • Basic descriptive statistics
                                                                                                                          • Outlier detection using outlier
                                                                                                                          • Basic data cleaning using scrub
                                                                                                                          • Recoding categorical variables into dummy coded variables
                                                                                                                            • Simple descriptive graphics
                                                                                                                              • Scatter Plot Matrices
                                                                                                                              • Density or violin plots
                                                                                                                              • Means and error bars
                                                                                                                              • Error bars for tabular data
                                                                                                                              • Two dimensional displays of means and errors
                                                                                                                              • Back to back histograms
                                                                                                                              • Correlational structure
                                                                                                                              • Heatmap displays of correlational structure
                                                                                                                                • Testing correlations
                                                                                                                                • Polychoric tetrachoric polyserial and biserial correlations
                                                                                                                                  • Multilevel modeling
                                                                                                                                    • Decomposing data into within and between level correlations using statsBy
                                                                                                                                    • Generating and displaying multilevel data
                                                                                                                                    • Factor analysis by groups
                                                                                                                                      • Multiple Regression mediation moderation and set correlations
                                                                                                                                        • Multiple regression from data or correlation matrices
                                                                                                                                        • Mediation and Moderation analysis
                                                                                                                                        • Set Correlation
                                                                                                                                          • Converting output to APA style tables using LaTeX
                                                                                                                                          • Miscellaneous functions
                                                                                                                                          • Data sets
                                                                                                                                          • Development version and a users guide
                                                                                                                                          • Psychometric Theory
                                                                                                                                          • SessionInfo

                                                                                                                        biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49corsmooth 34cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7

                                                                                                                        fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47irt2latex 47 49lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12prep 49pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34

                                                                                                                        60

                                                                                                                        polyserial 34principal 5ndash7psych 51rtest 28rangeCorrection 49readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

                                                                                                                        rtest 28

                                                                                                                        rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50table 19tail 49tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

                                                                                                                        R package

                                                                                                                        61

                                                                                                                        ctv 7GPArotation 7Hmisc 28KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash8 28 47 49ndash52Rgraphviz 8sem 7 50stats 28Sweave 47xtable 47

                                                                                                                        rtest 28rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49Rgraphviz 8

                                                                                                                        SAPA 26 50 51satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43sem 7 50set correlation 44setcor 49setCor 4 38 41 44 46 47simmultilevel 37

                                                                                                                        spider 13stars 14stats 28StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50Sweave 47

                                                                                                                        table 19tail 49tetrachoric 6 34Thurstone 28 38 50Thurstone33 50topBottom 49

                                                                                                                        vegetables 50 51violinBy 14 18vss 5 6

                                                                                                                        weighted least squares 6withinBetween 37

                                                                                                                        xtable 47

                                                                                                                        62

                                                                                                                        • Jump starting the psych packagendasha guide for the impatient
                                                                                                                        • Psychometric functions are summarized in the second vignette
                                                                                                                        • Overview of this and related documents
                                                                                                                        • Getting started
                                                                                                                        • Basic data analysis
                                                                                                                          • Getting the data by using readfile
                                                                                                                          • Data input from the clipboard
                                                                                                                          • Basic descriptive statistics
                                                                                                                            • Outlier detection using outlier
                                                                                                                            • Basic data cleaning using scrub
                                                                                                                            • Recoding categorical variables into dummy coded variables
                                                                                                                              • Simple descriptive graphics
                                                                                                                                • Scatter Plot Matrices
                                                                                                                                • Density or violin plots
                                                                                                                                • Means and error bars
                                                                                                                                • Error bars for tabular data
                                                                                                                                • Two dimensional displays of means and errors
                                                                                                                                • Back to back histograms
                                                                                                                                • Correlational structure
                                                                                                                                • Heatmap displays of correlational structure
                                                                                                                                  • Testing correlations
                                                                                                                                  • Polychoric tetrachoric polyserial and biserial correlations
                                                                                                                                    • Multilevel modeling
                                                                                                                                      • Decomposing data into within and between level correlations using statsBy
                                                                                                                                      • Generating and displaying multilevel data
                                                                                                                                      • Factor analysis by groups
                                                                                                                                        • Multiple Regression mediation moderation and set correlations
                                                                                                                                          • Multiple regression from data or correlation matrices
                                                                                                                                          • Mediation and Moderation analysis
                                                                                                                                          • Set Correlation
                                                                                                                                            • Converting output to APA style tables using LaTeX
                                                                                                                                            • Miscellaneous functions
                                                                                                                                            • Data sets
                                                                                                                                            • Development version and a users guide
                                                                                                                                            • Psychometric Theory
                                                                                                                                            • SessionInfo

                                                                                                                          polyserial 34principal 5ndash7psych 51rtest 28rangeCorrection 49readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

                                                                                                                          rtest 28

                                                                                                                          rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50table 19tail 49tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

                                                                                                                          R package

                                                                                                                          61

                                                                                                                          ctv 7GPArotation 7Hmisc 28KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash8 28 47 49ndash52Rgraphviz 8sem 7 50stats 28Sweave 47xtable 47

                                                                                                                          rtest 28rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49Rgraphviz 8

                                                                                                                          SAPA 26 50 51satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43sem 7 50set correlation 44setcor 49setCor 4 38 41 44 46 47simmultilevel 37

                                                                                                                          spider 13stars 14stats 28StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50Sweave 47

                                                                                                                          table 19tail 49tetrachoric 6 34Thurstone 28 38 50Thurstone33 50topBottom 49

                                                                                                                          vegetables 50 51violinBy 14 18vss 5 6

                                                                                                                          weighted least squares 6withinBetween 37

                                                                                                                          xtable 47

                                                                                                                          62

                                                                                                                          • Jump starting the psych packagendasha guide for the impatient
                                                                                                                          • Psychometric functions are summarized in the second vignette
                                                                                                                          • Overview of this and related documents
                                                                                                                          • Getting started
                                                                                                                          • Basic data analysis
                                                                                                                            • Getting the data by using readfile
                                                                                                                            • Data input from the clipboard
                                                                                                                            • Basic descriptive statistics
                                                                                                                              • Outlier detection using outlier
                                                                                                                              • Basic data cleaning using scrub
                                                                                                                              • Recoding categorical variables into dummy coded variables
                                                                                                                                • Simple descriptive graphics
                                                                                                                                  • Scatter Plot Matrices
                                                                                                                                  • Density or violin plots
                                                                                                                                  • Means and error bars
                                                                                                                                  • Error bars for tabular data
                                                                                                                                  • Two dimensional displays of means and errors
                                                                                                                                  • Back to back histograms
                                                                                                                                  • Correlational structure
                                                                                                                                  • Heatmap displays of correlational structure
                                                                                                                                    • Testing correlations
                                                                                                                                    • Polychoric tetrachoric polyserial and biserial correlations
                                                                                                                                      • Multilevel modeling
                                                                                                                                        • Decomposing data into within and between level correlations using statsBy
                                                                                                                                        • Generating and displaying multilevel data
                                                                                                                                        • Factor analysis by groups
                                                                                                                                          • Multiple Regression mediation moderation and set correlations
                                                                                                                                            • Multiple regression from data or correlation matrices
                                                                                                                                            • Mediation and Moderation analysis
                                                                                                                                            • Set Correlation
                                                                                                                                              • Converting output to APA style tables using LaTeX
                                                                                                                                              • Miscellaneous functions
                                                                                                                                              • Data sets
                                                                                                                                              • Development version and a users guide
                                                                                                                                              • Psychometric Theory
                                                                                                                                              • SessionInfo

                                                                                                                            ctv 7GPArotation 7Hmisc 28KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash8 28 47 49ndash52Rgraphviz 8sem 7 50stats 28Sweave 47xtable 47

                                                                                                                            rtest 28rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49Rgraphviz 8

                                                                                                                            SAPA 26 50 51satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43sem 7 50set correlation 44setcor 49setCor 4 38 41 44 46 47simmultilevel 37

                                                                                                                            spider 13stars 14stats 28StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50Sweave 47

                                                                                                                            table 19tail 49tetrachoric 6 34Thurstone 28 38 50Thurstone33 50topBottom 49

                                                                                                                            vegetables 50 51violinBy 14 18vss 5 6

                                                                                                                            weighted least squares 6withinBetween 37

                                                                                                                            xtable 47

                                                                                                                            62

                                                                                                                            • Jump starting the psych packagendasha guide for the impatient
                                                                                                                            • Psychometric functions are summarized in the second vignette
                                                                                                                            • Overview of this and related documents
                                                                                                                            • Getting started
                                                                                                                            • Basic data analysis
                                                                                                                              • Getting the data by using readfile
                                                                                                                              • Data input from the clipboard
                                                                                                                              • Basic descriptive statistics
                                                                                                                                • Outlier detection using outlier
                                                                                                                                • Basic data cleaning using scrub
                                                                                                                                • Recoding categorical variables into dummy coded variables
                                                                                                                                  • Simple descriptive graphics
                                                                                                                                    • Scatter Plot Matrices
                                                                                                                                    • Density or violin plots
                                                                                                                                    • Means and error bars
                                                                                                                                    • Error bars for tabular data
                                                                                                                                    • Two dimensional displays of means and errors
                                                                                                                                    • Back to back histograms
                                                                                                                                    • Correlational structure
                                                                                                                                    • Heatmap displays of correlational structure
                                                                                                                                      • Testing correlations
                                                                                                                                      • Polychoric tetrachoric polyserial and biserial correlations
                                                                                                                                        • Multilevel modeling
                                                                                                                                          • Decomposing data into within and between level correlations using statsBy
                                                                                                                                          • Generating and displaying multilevel data
                                                                                                                                          • Factor analysis by groups
                                                                                                                                            • Multiple Regression mediation moderation and set correlations
                                                                                                                                              • Multiple regression from data or correlation matrices
                                                                                                                                              • Mediation and Moderation analysis
                                                                                                                                              • Set Correlation
                                                                                                                                                • Converting output to APA style tables using LaTeX
                                                                                                                                                • Miscellaneous functions
                                                                                                                                                • Data sets
                                                                                                                                                • Development version and a users guide
                                                                                                                                                • Psychometric Theory
                                                                                                                                                • SessionInfo

                                                                                                                              top related