Unlock the Secrets of R G.Janacek
Contents
1 What is R? 21.0.1 Getting up and running . . . . . . . . . . . . . . . . . . . . . . . . . 21.0.2 Getting R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.0.3 A first R session . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.0.4 An example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.0.5 Quitting R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2 Commands, objects and functions 72.0.6 Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.0.7 Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82.0.8 Basic objects. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.0.9 Factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.0.10 Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.0.11 Repetitive execution:for loops, repeat and while . . . . . . . . . . . . 112.0.12 Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.1 Vectors, Matrices and Data Frames . . . . . . . . . . . . . . . . . . . . . . 142.1.1 Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142.1.2 Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152.1.3 The Dataframe . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3 Input and Output 22
4 R help and documentation 234.0.4 The Plot Command . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.1 Saving Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264.2 Adding Packages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.2.1 Windows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274.2.2 OS X . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.3 Some examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274.3.1 Packages you already have . . . . . . . . . . . . . . . . . . . . . . . . 28
5 Appendix 1: Vocabulary 29
6 Appendix 2 : Numerical Types 34
7 Regression 34
8 Multivariate Data 37
1
1 What is R?
The S programming language was created by John Chambers for doing statistical analysisand was later implemented in the S Plus system. Later Ross Ihaka and Robert Gentlemancreated R named partly as a play on the name S from which it is derived.
The source code for the R software environment is written primarily in C, Fortran,and is freely available under the GNU General Public License. Free pre-compiled binaryversions are provided for various operating systems. R development is now mostly done bythe Core Development Team, see
http://www.rproject.org/contributors.
R is a powerful statistical programming language and has a broad set of facilities fordoing statistical analyses. Because it is open source as new statistical techniques aredeveloped, new packages become available. Consequently there is R code available to dopretty much any sort of analysis that you can think of, and if you do manage to think ofa new one you can always write the code yourself.
In short R will carry out analyses that are difficult or impossible in many other packages.
R has a broad range of graph-drawing tools, which make it very easy to producepublication-standard graphs. Because R is produced by people who know about datapresentation, the default options for R graphs are simple, elegant and sensible.
On top of this base graphics system there are a number of additional graph packagesavailable that give you whole new graphics systems. For examples of the graphics have alook at
http://rgraphgallery.blogspot.co.uk.
1.0.1 Getting up and running
Learning R takes some effort. However, just like any new natural language useful thingscan be done before achieving fluency. I think that the process of learning R can be brokendown into the following five stages:
1. Understand something of the environment in which the R programming languageis maintained. Become familiar with the resources available. Install the R on yourcomputer and run a test script.
2. Read csv files into data frames and use R functions to perform statistical analyses ina familiar area.
3. Use the basic control structures of the R language to write simple programs. Writeyour own functions, become familiar with the data structures included in R and beginto explore the rich features of the language. Interface with database, web pages andother external data sources.
2
4. Write complex programs in the language. Develop an understanding of the deepstructure of the language S3 and S4 objects, closures etc.
5. Develop programs for production use. Write an R package.
The completion of Stage 2 with a bit at at Stage 3 is normally all that most peopleneed.
Once you become familiar with the libraries of R functions that are important to youfield, this is usually sufficient for most people.
1.0.2 Getting R
You will find R on most university machines but be aware if you do not have administrativeprivileges on the machine you are using then your use will be limited. If you have yourown machine then R is freely downloadable from
http://cran.r-project.org/,
as is a wide range of documentation. If you are using Windows, OS X you can downloadit and run the installer. It is is simple and easy to get running - honest. If you use Linuxthen it also poses few problems.
Bear in mind everything is to be found at
btexttthttp://www.r-project.org/
and
http://cran.r-project.org/
1.0.3 A first R session
So we have downloaded a copy of R, starting the the application and we see something like
R version 3.0.3 (2014-03-06) -- "Warm Puppy"
Copyright (C) 2014 The R Foundation for Statistical Computing
Platform: x86_64-apple-darwin10.8.0 (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type ’license()’ or ’licence()’ for distribution details.
Natural language support but running in an English locale
R is a collaborative project with many contributors.
Type ’contributors()’ for more information and
3
’citation()’ on how to cite R or R packages in publications.
Type ’demo()’ for some demos, ’help()’ for on-line help, or
’help.start()’ for an HTML browser interface to help.
Type ’q()’ to quit R.
[R.app GUI 1.63 (6660) x86_64-apple-darwin10.8.0]
[Workspace restored from /Users/jan/.RData]
[History restored from /Users/jan/.Rapp.history]
>
and you are faced with the command prompt without any nice buttons or menus to helpyou.
1.0.4 An example
This is the point where things usually start going pear-shaped for many people so we lookat a simple analysis to see what is in store.
Here we have the Birthweight of babies from the excellent book by Annette Dobson.Before doing any sort of analysis we need to get our data loaded into the programme. For
Age Boy Age Girl
40 2968 40 331738 2795 36 272940 3163 40 293535 2925 38 275436 2625 42 321037 2847 39 281741 3292 40 312640 3473 37 253937 2628 36 241238 3176 38 299140 3421 39 287538 2975 40 3231
Table 1: Birth weight and age in weeks
a big data set you would do this by entering the data into a spreadsheet or text file andimporting it to R (see the Importing data later ) but with a small dataset you can enterthe data directly at the command prompt.
4
Being lazy I will just take the first two columns. We input the age as follows:
> age<-scan()
1: 40 38 40 35 36 37 41 40 37 38 40 38
13:
Read 12 items
> boy<-scan()
1: 2968 2795 3163 2925 2625 2847 3292 3473 2628 3176 3421 2975
13:
Read 12 items
>
The command scan() reads data into the system until it gets a blank line return.You can cut and paste! Thus we have created two sets of numbers ( vectors) in R calledage and boy.The left arrow
<-
means take whats to the right of the arrow and make an object with the name thats to theleft of the arrow.
If you prefer use =. You can check to make sure its correct by just typing the nameand pressing enter.
> age
[1] 40 38 40 35 36 37 41 40 37 38 40 38
>
Now your data are entered into R you can take a look at them. Its always a good ideato visualise your data before doing any analysis, and you can do this by asking R to plotthe data out for you. Type
plot( age,boy)
and get a scatterplot. Not quite publication quality but not bad.While we are at it we can try a linear regression of Boy on Age. I know that regression
is lm in R speak so using lm we get
> rbaby=lm(boy~age) # Note there is no response!
>
> summary(rbaby)
Call:
lm(formula = boy ~ age)
Residuals:
5
Min 1Q Median 3Q Max
-246.69 -151.20 -29.16 194.59 274.28
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1268.67 1239.97 -1.023 0.33035
age 111.98 32.31 3.466 0.00606 **
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 200.9 on 10 degrees of freedom
Multiple R-squared: 0.5457,Adjusted R-squared: 0.5003
F-statistic: 12.01 on 1 and 10 DF, p-value: 0.006065
# more plots are available
> plot(rbaby)
This has everything you need. It tells you exactly what sort of analysis has been done,and the names of the variables you asked it to analyse.
You get a significance test and a p-value, telling you that there is a statistically signif-icant coefficient.
Thats it. Data has been input, we have drawn a graph and carried out a statistical test.
This has illustrated some important points about how R works. You do things bytyping commands and pressing enter. We are using a statistical language so
\midx{\btexttt{boy~age}}
means that boy is related to age. You can create data objects such as baby which containyour results give commands that make R do things like carry out tests and draw graphs.Some of these commands have self-evident names like plot but be assured many do not.
You will find out that if you make mistakes in what you are typing the error messagesare rarely helpful.
1.0.5 Quitting R
To quit an R session use the function call q(). When you use R you create a workspacein which all the objects you have created are stored. When you try and quit you will beasked if you wish to save your (workspace) session.
6
2 Commands, objects and functions
We now introduce some of the fundamental concepts that you need to know to be aconfident user of R.
2.0.6 Basics
You interact with R by typing something into the console at the command prompt andpressing enter. If what you have typed makes sense R it will do what its told to do, ifnot then you will get an error message. The simplest things you can get R to do arestraightforward sums. Just type in a calculation and press enter.
> 1+2+3
[1] 6
> 27*12
[1] 324
> 3/7
[1] 0.4285714
> 5^5
[1] 3125
> (1/4+2+3)*2.7*12*(-0.003^7)
[1] -3.720087e-16
>
Why does the answer have a [1] in front? We shall see later. When you’re asking R todo calculations that are more complex than just a single arithmetic operation then it willcarry out the calculations in usual PEDAMs fashion from left to right:
Most sensible people use brackets!
These simple calculations do not produce any kind of output that is remembered by R: theanswers just appear in the console window. If you want to do further calculations with theanswer to a calculation you need to give it a name and tell R to store it as an object.
We might wish to store the answer to a calculation, say add 1 and 2 and store theanswer in an object called example1. To find out what is stored in answer, just type thename of the object:
> example1=1+2
> example1
[1] 3
> example2<- 1+2
> example2
7
[1] 3
>
Take particular note of the middle symbol in the instruction above, <-. This is theassign symbol, formed of a less than arrow and a hyphen and it looks like a left arrow .It means make the object on the left into the output of the command on the right.It alsoworks the other way around: 2+2->answer. You can also use an equals = sign for allocationbut it can cause confusion with other uses of the equals sign.
Its quite common when using R to type in a command and see nothing happen, exceptfor the command prompt popping up again. When nothing seems to happen that meansthat there have been no errors errors, at least as far as R is concerned, which implies thateverything has gone smoothly.
2.0.7 Objects
We have created objects like example1,example2 which are just variables.Note when we did the regression above using lm we created an object baby which
contained the results. To see the results contained in an object we used the commandsummary. To examine what is inside an object we can use also attributes() and thenlook inside using the dollar symbol
> attributes(rbaby)
$names
[1] "coefficients" "residuals" "effects" "rank" "fitted.values"
[6] "assign" "qr" "df.residual" "xlevels" "call"
[11] "terms" "model"
# No we can look at each element - hash is comment
$class
[1] "lm"
> rbaby$qr
$qr
(Intercept) age
1 -3.4641016 -132.7905619
2 0.2886751 6.2182527
3 0.2886751 -0.2079874
4 0.2886751 0.5960970
5 0.2886751 0.4352802
6 0.2886751 0.2744633
7 0.2886751 -0.3688042
8 0.2886751 -0.2079874
8
9 0.2886751 0.2744633
10 0.2886751 0.1136464
11 0.2886751 -0.2079874
12 0.2886751 0.1136464
attr(,"assign")
[1] 0 1
$qraux
[1] 1.288675 1.113646
$pivot
[1] 1 2
$tol
[1] 1e-07
$rank
[1] 2
attr(,"class")
[1] "qr"
> rbaby$coefficients
(Intercept) age
-1268.6724 111.9828
>
You can use objects in calculations in exactly the same way as numbers used above. Youcan also store the results of a calculation done with objects as another object.
When you first open R, there are no objects stored, but after a while you might havelots. You can get a list of whats there by using the command. ls()
You can remove an object from Rs memory by using the rm() function. Notice thatwhen you type this
it does not ask you if you are sure, or give you any other sort of warning, nor does it letyou know whether its done as you asked.
The object you asked it to remove has just gone: you can confirm this by using ls() again.If you try to delete a non-existent object you get an error message.
2.0.8 Basic objects.
Internally R works with lists of objects, thus the basic numerical object is a list of numbers( a vector) and a single number is a vector of length 1. Hence the [1]’s appearing above.
9
when we print a number.There are six basic (atomic) vector types: logical, integer, real, complex, string (or
character) and raw. The modes and storage modes for the different vector types are givenin the table in the appendix.
Single numbers, such as 4.2 , and strings, such as ”four point two” are still vectors, oflength 1; there are no more basic types. Vectors with length zero are possible (and useful).R can use strings of characters as objects. These have to be entered with quote marksaround them because otherwise R will think that they’re the names of objects and returnan error when it can’t find them. So
> name=c("brodwin","blodwin","r")
> name
[1] "brodwin" "blodwin" "r"
>
> w="four point 2"
> w
[1] "four point 2"
> w=c("A","B","C","D")
> w
[1] "A" "B" "C" "D"
Note :I have got ahead of myself and use the concatenate command c. This sticks elementstogether as above.
String vectors have mode and storage mode ”character” and a single element of a charactervector is often referred to as a character string.
You can also have TRUE and FALSE, NAN. TRUE and FALSE are obvious logicalsand NAN is “not a number” i.e. missing data.
2.0.9 Factors
A special type of data in R is a factor. When were collecting data we we might recordwhether a subject is male or female, whether a cricket is winged, wingless or intermediateor whether someone is male or female This type of data, where things are divided intoclasses, is called categorical or nominal data, and in R it is stored as a factor. We caninput nominal data into R as numbers if we assign a number to each category, such as1=red, 2=green and 3=blue and then tell R to make it a factor with the factor() function,but this can lead to confusion. Usually its better to input data like this as the wordsthemselves as character data and then tell R to make it a factor. So
> gender =c("female","female","female","female","male","male","male")
10
> gender=factor(gender)
> gender
[1] female female female female male male male
Levels: female male
2.0.10 Control
If statements: The language has available a conditional construction of the form
> if (expr_1) expr_2 else expr_3
where expr 1 must evaluate to a single logical value and the result of the entire expressionis then evident.
> x=3
> if (x>0)(sqrt(x))else(NA)
[1] 1.732051
> x=-2
> if (x>0)(sqrt(x))else(NA)
[1] NA
>
The short-circuit operators && ”and”, || ”or” are often used as part of the conditionin an if statement. Whereas & and | apply element-wise to vectors, &&and || apply tovectors of length one, and only evaluate their second argument if necessary.
There is a vectorized version of the if/else construct, the ifelse function. This has theform
ifelse(condition, a, b)
and returns a vector of the length of its longest argument, with elements a[i] if condition[i] is true, otherwise b[i] .
> x=rnorm(20)
> x
[1] 1.07946911 -0.38368774 0.59394725 -0.59558286 1.14503714 0.85581439 0.31156696 -0.85308259
[9] 0.86578634 -0.87393923 -1.06192024 -0.61750575 0.05840533 -0.92554621 2.16806721 0.40747540
[17] -0.04350849 -0.63534693 0.61456348 -0.30498590
> ifelse(x>0 ,sqrt(x),sqrt(-x))
[1] 1.0389750 0.6194253 0.7706797 0.7717401 1.0700641 0.9251024 0.5581818 0.9236247 0.9304764
[10] 0.9348472 1.0304951 0.7858153 0.2416720 0.9620531 1.4724358 0.6383380 0.2085869 0.7970865
[19] 0.7839410 0.5522553
2.0.11 Repetitive execution:for loops, repeat and while
There is also a for loop construction which has the structure
> for (name in expr_1) expr_2
11
where name is the loop variable expr 1 is a vector expression, (perhaps a sequence like1:20), and expr 2 is often a grouped expression with its sub-expressions written in termsof the dummy name. expr 2 is repeatedly evaluated as name ranges through the valuesin the vector result of expr.
> for (i in 1:10){
+ x=i^2
+ print(x)}
[1] 1
[1] 4
[1] 9
[1] 16
[1] 25
[1] 36
[1] 49
[1] 64
[1] 81
[1] 100
Warning: for() loops are used in R code much less often than in compiled languages. Codethat takes a whole object view is likely to be both clearer and faster in R. Thus
> x=1:10
> x
[1] 1 2 3 4 5 6 7 8 9 10
> sum(x)
[1] 55
and lapply(),tapply(),sapply().Other looping facilities include the repeat expr statement and the while (condition)
expr statement. The break statement can be used to terminate any loop, possibly abnor-mally. This is the only way to terminate repeat loops.
2.0.12 Functions
You can get so far by typing in calculations, but that is not much use for most statisticalanalyses. Remember that while R is really a programming language it comes with ahuge variety of (mostly) short ready-made pieces of code that will do things like manageyour data, do complex mathematical operations on your data, draw graphs and carryout statistical analyses ranging from the simple and straightforward to the eye-wateringlycomplex.
12
These ready- made pieces of code are called functions. Each function name ends in apair of brackets e.g. lm() and for many of the more straightforward functions you justtype the name of the function and put the name of the object you would like the procedurecarried out on in the brackets.
You can carry out more complex calculations by making the argument of the function(the bit between the brackets) a calculation itself:
> log(3)
[1] 1.098612
> log(3*5/13)
[1] 0.1431008
> log(sin(3))
[1] -1.958145
You can use functions in creating new objects. In our example above the functionlm (linear model) was used to create an object baby while plot provided a plot.
One of the problems with R is learning the names of the functions. It is a bit like magicyou need the name of the spell! You need a dictionary or a crib to get you going so youneed to do some reading to afire a vocabulary.
A function is defined by an assignment of the form
name <- function(arg_1, arg_2, ...) expression
The expression is an R expression, (usually a grouped expression i.e. ... ), that usesthe arguments, to calculate a value. The value of the expression is the value returned forthe function. A call to the function then usually takes the form
name(expr_1, expr_2, ...)
and may occur anywhere a function call is legitimate.A nice thing about most R functions is that have default values specified for most of
their arguments, and if nothing is specified the function will just use the default value.As an example, consider a function to calculate the two sample t-statistic, showing all
the steps. This is an artificial example, of course, since there are other, simpler ways ofachieving the same end. The function is defined as follows:
> twosam <- function(y1, y2) {
n1 <- length(y1); n2 <- length(y2)
yb1 <- mean(y1); yb2 <- mean(y2)
s1 <- var(y1); s2 <- var(y2)
s <- ((n1-1)*s1 + (n2-1)*s2)/(n1+n2-2)
tst <- (yb1 - yb2)/sqrt(s*(1/n1 + 1/n2))
tst
}
13
With this function defined, you could perform two sample t-tests using a call such as
tstat <- twosam(data$male, data$female); tstat
Note that any ordinary assignments done within the function are local and temporary andare lost after exit from the function. Thus the assignment
yb1<-mean(y1)
does not affect the value of the argument in the calling program.To understand completely the rules governing the scope of R assignments the reader
needs to be familiar with the notion of an evaluation frame. This is a somewhat advanced,though hardly difficult, topic and is not covered further here.
2.1 Vectors, Matrices and Data Frames
When were analysing experimental data, of course, we are likely to be working with lots ofnumbers, and R is especially good at dealing with objects that are groups of numbers, orgroups of character or logical data. In the case of numbers these groups can be organisedas sequences, vectors, or as two dimensional tables of numbers, matrices. R can also dealwith tables that have some columns of numbers and some columns with other kinds ofdata: these are called data frames.
2.1.1 Vectors
We have already used the function called concatenate c function
> x=c(1,3,5,7,9)
> x
[1] 1 3 5 7 9
Which creates a new object called x (a vector in this case) containing a sequence of numberscounting up from 1 to 10. To see what x is just type its name. There are other ways to setup vectors. One of the most important uses the function called
seq(from,to,by)
which produces sequences of numbers. We could write out the command in full, with namesfor all the arguments, as , but because we know that R knows that the first argumentbetween the brackets corresponds to the
from= argument
, the second one to the
to= argument
14
and the default value for
by= is
we can write a much shorter instruction.
> x=seq(1,10)
> x
[1] 1 2 3 4 5 6 7 8 9 10
which produces sequences of numbers. We could write out the command in full, with namesfor all the arguments, as
seq(from = 1, to = 1, by = ((to - from)/(length.out - 1)),
length.out = NULL, along.with = NULL, ...)
x=seq(1,1,10)
We can refer to elements or slices of a vector
> x[3]
[1] 5
> x[2:4]
[1] 3 5 7
> x[5:1]
[1] 9 7 5 3 1
> x[11]
[1] NA
Notice the NA for an element that does not exist.
> x*2
[1] 2 6 10 14 18
> x-y[1:5]
[1] 0 1 2 3 4
> x^3
[1] 1 27 125 343 729
>
2.1.2 Matrices
When we have data that are arranged in two dimensions rather than one we have a matrix.We can set one up using the function matrix
15
> m1=matrix(data=seq(1:20),nrow=5,ncol=4,dimnames=list(c("A","B","C","D","E")))
> m1
[,1] [,2] [,3] [,4]
A 1 6 11 16
B 2 7 12 17
C 3 8 13 18
D 4 9 14 19
E 5 10 15 20
>
One thing to notice about this is that the default option in R is to fill a matrix up incolumn order rather than row order. This can be reversed by using the
byrow=TRUE
argument. Be careful about this when setting matrices up because it’s easy to make amistake. We can make life shorter as
> m2=matrix(1:20,5,4)
> m2
[,1] [,2] [,3] [,4]
[1,] 1 6 11 16
[2,] 2 7 12 17
[3,] 3 8 13 18
[4,] 4 9 14 19
[5,] 5 10 15 20
>
> m1[1,2]
A
6
> m1[2:4,1:3]
[,1] [,2] [,3]
B 2 7 12
C 3 8 13
D 4 9 14
> m1[,1]
A B C D E
1 2 3 4 5
> m1[1,]
[1] 1 6 11 16
>
16
We can carry out simple arithmetic with our matrix just as we can with a vector.
> m1=matrix(1:4,2,2)
> m1
[,1] [,2]
[1,] 1 3
[2,] 2 4
> m2=matrix(1:6,2,3)
> m2
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
> m3=matrix(c(-2,0,11,3),2,2)
> m3
[,1] [,2]
[1,] -2 11
[2,] 0 3
> m1+m1
[,1] [,2]
[1,] 2 6
[2,] 4 8
> m1-m2
Error in m1 - m2 : non-conformable arrays
> m1-m3
[,1] [,2]
[1,] 3 -8
[2,] 2 1
> m1%*%m3
[,1] [,2]
[1,] -2 20
[2,] -4 34
> m1%*%m2
[,1] [,2] [,3]
[1,] 7 15 23
[2,] 10 22 34
> solve(m1)
[,1] [,2]
[1,] -2 1.5
[2,] 1 -0.5
> m1%*%solve(m1)
[,1] [,2]
17
[1,] 1 0
[2,] 0 1
We can use matrices as arguments for functions.
2.1.3 The Dataframe
Very often we have data in a tabular form, for example we might prefer
Age Boy Age Girl
40 2968 40 331738 2795 36 272940 3163 40 293535 2925 38 275436 2625 42 321037 2847 39 281741 3292 40 312640 3473 37 253937 2628 36 241238 3176 38 299140 3421 39 287538 2975 40 3231
Table 2: default
If this is in a spreadsheet, for example excel we can save as it as .csv file and thenread the file into a data frame. Data Frames are tightly coupled collections of variableswhich share many of the properties of matrices and of lists, and are the fundamental datastructure by most of R functions. Less formally, a dataframe is a type of table where thetypical use employs the rows as observations and the columns as variables.
> baby=read.csv(file.choose(),header=TRUE)
> baby
Gender Age Weight
1 M 40 2968
2 M 38 2795
3 M 40 3163
4 M 35 2925
5 M 36 2625
6 M 37 2847
7 M 41 3292
18
Gender Age Weight
M 40 2968M 38 2795M 40 3163M 35 2925M 36 2625M 37 2847M 41 3292M 40 3473M 37 2628M 38 3176M 40 3421M 38 2975F 40 3317F 36 2729F 40 2935F 38 2754F 42 3210F 39 2817F 40 3126F 37 2539F 36 2412F 38 2991F 39 2875F 40 3231
Table 3: default
19
8 M 40 3473
9 M 37 2628
10 M 38 3176
11 M 40 3421
12 M 38 2975
13 F 40 3317
14 F 36 2729
15 F 40 2935
16 F 38 2754
17 F 42 3210
18 F 39 2817
19 F 40 3126
20 F 37 2539
21 F 36 2412
22 F 38 2991
23 F 39 2875
24 F 40 3231
>
Note
• that R assumes that the character vectors that are going into the new data frame arefactors and makes them so.
• The file.choose() command is for people like me who connote remember which direc-tory contains the data file. Clever people use the path.
We can select part of the data frame by using indices
> baby[2:3,]
Gender Age Weight
2 M 38 2795
3 M 40 3163
> baby[1,]
Gender Age Weight
1 M 40 2968
> baby[,2]
[1] 40 38 40 35 36 37 41 40 37 38 40 38 40 36 40 38 42 39 40 37 36 38 39 40
>
or
> baby$Age
[1] 40 38 40 35 36 37 41 40 37 38 40 38 40 36 40 38 42 39 40 37 36 38 39 40
20
or
> subset(baby,Weight>3000)
Gender Age Weight
3 M 40 3163
7 M 41 3292
8 M 40 3473
10 M 38 3176
11 M 40 3421
13 F 40 3317
17 F 42 3210
19 F 40 3126
24 F 40 3231
> subset(baby,Gender=="F")
Gender Age Weight
13 F 40 3317
14 F 36 2729
15 F 40 2935
16 F 38 2754
17 F 42 3210
18 F 39 2817
19 F 40 3126
20 F 37 2539
21 F 36 2412
22 F 38 2991
23 F 39 2875
24 F 40 3231
Now we have a data frame we find that we cannot use the data inside the frame. Toassess the data we need to do one of two things
1. attach the data frame. This tells R what is in the frame and it can be used
> attach(baby)
> Weight
[1] 2968 2795 3163 2925 2625 2847 3292 3473 2628 3176 3421 2975 3317 2729 2935 2754 3210
[18] 2817 3126 2539 2412 2991 2875 3231
2. The alternative is to be explicit and use command like
21
> baby$Weight
[1] 2968 2795 3163 2925 2625 2847 3292 3473 2628 3176 3421 2975 3317 2729 2935 2754 3210
[18] 2817 3126 2539 2412 2991 2875 3231
> baby[3]
Weight
1 2968
2 2795
3 3163
4 2925
5 2625
6 2847
7 3292
8 3473
9 2628
10 3176
11 3421
12 2975
13 3317
14 2729
15 2935
16 2754
17 3210
18 2817
19 3126
20 2539
21 2412
22 2991
23 2875
24 3231
3 Input and Output
As you might expect there are lots of complex ways to get data. We have seen how to usescan but a more useful approach is to use .csv files as above. The command is
read.csv(file, header = TRUE, sep = ",", quote = "\"",
dec = ".", fill = TRUE, comment.char = "", ...)
I have restricted discussion of input but there are several other possibilities as the foreign
package allows one to import data in several other formate e.g. STATA. For example
22
> require(foreign)
Loading required package: foreign
> require(MASS)
Loading required package: MASS
> cdata <- read.dta("http://www.ats.ucla.edu/stat/data/crime.dta")
> # This load STATA formatted data set at UCLA
You can write data using
write(x, file = "data",
ncolumns = if(is.character(x)) 1 else 5,
append = FALSE, sep = " ")
Beware you may have to transpose your data matrix.
4 R help and documentation
You are probably beginning to see a problem in using R. You are probably beginning tosee a problem in using You are probably beginning to see a problem in using R, you haveto know the name of the function you would like to use. While there are some prototypeGUI interfaces you will have to resign yourself to the UNIX command driven world. Theappendices contain copies of some of the CRAN crib sheets but in addition the help systemcan be useful.
The apropos() command is convenient when you are not sure that you know the nameof a function. For example if you were after a stem and leaf function but were not sure ifthe name was stem or stemandleaf. Try
> apropos(stem)
[1] "stem" "system" "system.file" "system.time"
The help system can be used in several ways may be used in several ways:
• Type help.start() at the R command line. This brings up an html version of thehelp system. (The Windows and Macintosh versions of the help system also containinformation specific to those environments.) Within the help system, in particular:
“An Introduction to R ”
is the definitive, quite advanced, reference manual intended for those with fairlysubstantial statistical knowledge;
• The R base and ctest packages document all the main R functions.
23
• Help on individual functions and datasets is also available from the R commandline so for help on the function plot, type ?plot Or help(plot)
• help.search() or ?? for finding help pages on a vague topic;
• library() for listing available packages and the help objects they contain.
• data() for listing available data sets;
• Under R for Windows, the entire help system is also available from the Help menu.This also has pdf versions of “An Introduction to R ” and other manuals.
• Don’t forget Google etc.
The latest version of the entire help system in printable form, together with further con-tributed documentation and tutorials, is also available from CRAN
We can look at some samples using inbuilt datasets. Try help(data). A histogram tostart , we try help(hist)
> # Simple Histogram
> hist(mtcars$mpg)
> # Colored Histogram with Different Number of Bins
> hist(mtcars$mpg, breaks=20, col="green")
hist(mtcars$mpg, breaks=20, col="green",xlab="mpg")
hist(mtcars$mpg, breaks=20, col="green",xlab="mpg",main="MPG")
You may prefer density plots, so help(density)
> # Kernel Density Plot
> d <- density(mtcars$mpg) # returns the density data
> plot(d) # plots the results
# Filled Density Plot
d <- density(mtcars$mpg)
plot(d, main="Kernel Density of Miles Per Gallon")
polygon(d, col="red", border="blue")
Try help(bar plot) and texttthelp(box plot)
> attach(baby)
> boxplot(Weight~Gender)
An example of a formula is Weight Gender generated for each value of Gender.
24
4.0.4 The Plot Command
By default, plot( ) function plots plots the (x,y) points using the names x and y to labelthe axes. However if you try the help system for plot you will find it is very flexible. Youcan
• choose lines to points by choosing type="p" or type="l"
• choose point by by choosing pch= a number or a character.
• label axes
• give a title
type can take the following values:
type description
p pointsl lineso overplotted points and linesb, c points (empty if ”c”) joined by liness, S stair stepsh histogram-like vertical linesn does not produce any points or lines
The commands lines and points have similar effect but they they will only over ploton a graph which exists. They CANNOT produce a plot ab initio.
You will find the par() command useful. We only point out one property —textttpar(mfrow=(r,s))sets up an r× s array and the next rs graphs become elements of the array.
> plot(Age,Weight)
# try with filled points
> plot(Age,Weight,pch=20)
# use colour to differentiate gender
> plot(Age,Weight,col=as.integer(Gender),pch=20)
# Or perhaps Letters
> plot(Age,Weight,pch=as.character(Gender))
Add colour
> plot(Age,Weight,pch=as.character(Gender),col= as.integer(Gender))
# Try any simplify
> flag=as.integer(Gender)
> flag
25
[1] 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1
> plot(Weight,Age,pch=20,col=flag)
# Add a legend
> legend("topleft",legend=c("Female","Male"),pch=20,col=c(1,2))
# And if your journal is Black and white
> plot(Age,Weight,pch=flag+19)
> legend("topleft",legend=c("Female","Male"),pch=c(21,20))
Of course can have multiple graphs
par(mfrow=c(2,2))
plot(Age,Weight,pch=21)
plot(Age,Weight,col=as.integer(Gender),pch=20)
plot(Age,Weight,pch=as.character(Gender))
plot(Age,Weight,col=as.integer(Gender),pch=as.character(Gender))
par(mfrow=c(1,1))
4.1 Saving Graphs
You can save the graph in a variety of formats from the menu File -> Save As. Withfunction output to
1. pdf("mygraph.pdf") pdf file
2. win.metafile("mygraph.wmf") windows metafile
3. png("mygraph.png") png file
4. jpeg("mygraph.jpg") jpeg file
5. bmp("mygraph.bmp") bmp file
6. postscript("mygraph.ps") postscript file
4.2 Adding Packages
The are add ones to the basic R system called packages. Typically researchers bundle uptheir new tools in a package and place it with CRAN. If you want a particular analysisthere is probably a package available to do what you want.
Do look at
• http://cran.r-project.org/web/packages/available packages by name.html
26
• http://blog.yhathq.com/posts/10RpackagesIwishIknewaboutearlier.html
• http://crantastic.org/
To add a package, once it is on your machine:
1. Download and install a package. It will then be available for use in future session aswell as the current one.
2. To use the package, invoke the library(package) command to load it into the currentsession. You need to do this each session, unless you customize your environment toautomatically load it each time.)
4.2.1 Windows
• Choose Install Packages from the Packages menu.
• Select a CRAN Mirror. (I find Switzerland reliable)
• Select a package e.g. boot.
• Then use the library(package) function to load it for use, either library(boot) orthe drop down menu.
4.2.2 OS X
• Choose Install Packages from the Packages and Data Menu.
• Select a CRAN Mirror. (I find Switzerland reliable)
• Select a package e.g. boot.
• Then use the library(package) function to load it for use, either library(boot)
or Package Installer from the drop down menu.
4.3 Some examples
• The boxplot.matrix( ) function in the sfsmisc package draws a boxplot for eachcolumn (row) in a matrix.
• The boxplot.n( ) function in the gplots package annotates each boxplot with itssample size.
• The bplot( ) function in the R lab package offers many more options controllingthe positioning and labeling of boxes in the output.
27
• A violin plot is a combination of a boxplot and a kernel density plot. They can becreated using the vioplot( ) function from vioplot package.
Creating a new package is reasonably (?) straightforward, and because R is now sowidely used in academia the majority of authors of publications describing new analysistechniques release an R package when they publish their new ideas.
4.3.1 Packages you already have
Some packages come with the base installation of R but are not automatically loadedwhen you start the software. To see what they are either look at the drop down menu ortype library(). For example lattice, which includes functions for a range of advancedgraphics, MASS, which is a package associated with the book Modern Applied Statisticswith S by Venables and Ripley (2002, Springer-Verlag) and contains a variety of usefulfunctions to do things like fit generalized linear models with negative binomial errors,
nmle which lets you fit linear and non-linear mixed-effects models, cluster whichbrings a range of functions for cluster analysis and survival which has functions forsurvival analysis (surprise!).
These packages will already probably be loaded onto your computer, but to make sureyou can use the function. Just type it in with nothing between the brackets and you willget more information about what is there than you really need. If you want to use one ofthem you can load them into R by using the
library(package name)
or the drop down menu. and it will be loaded.If you want to know a bit more about whats in a particular package you can type
library(help = splines)
where splines is the name of the package.This will get you some information about the package and a list of the various functions
that are included in the package. If you want to know more , as we said, one of the easiestways of finding out more information is to go to
http://cran.rproject.org/web/packages
which lists all 4000 odd packages currently available for R. If you click on the name of apackage you will be able to navigate to a link for the package manual which should tellyou everything you might ever want to know. It might be difficult to follow because itslikely to be written for consumption by clever people but you just need to persevere anduse Google. I find
http://crantastic.org/ a wonderfulresource.If you look on CRAN you can find the web page for vegan at
28
http://cran.r-project.org/web/packages/vegan/index.html
which lets you look at the manual for the package and also provides links to a number ofvignettes -documents giving details of how to carry out specific analyses using the package.This can be very useful once the package is installed When it is the wee small hours andyou would sell your cat’s soul to Satan just to get that analysis done those vignettes canpreserve your sanity.
5 Appendix 1: Vocabulary
The first functions to learn
?
str
# Important operators and assignment
%in%, match
=, <-, <<-
$, [, [[, head, tail, subset
with
assign, get
# Comparison
all.equal, identical
!=, ==, >, >=, <, <=
is.na, complete.cases
is.finite
# Basic math
*, +, -, /, ^, %%, %/%
abs, sign
acos, asin, atan, atan2
sin, cos, tan
ceiling, floor, round, trunc, signif
exp, log, log10, log2, sqrt
max, min, prod, sum
cummax, cummin, cumprod, cumsum, diff
pmax, pmin
29
range
mean, median, cor, sd, var
rle
# Functions to do with functions
function
missing
on.exit
return, invisible
# Logical & sets
&, |, !, xor
all, any
intersect, union, setdiff, setequal
which
# Vectors and matrices
c, matrix
# automatic coercion rules character > numeric > logical
length, dim, ncol, nrow
cbind, rbind
names, colnames, rownames
t
diag
sweep
as.matrix, data.matrix
# Making vectors
c
rep, rep_len
seq, seq_len, seq_along
rev
sample
choose, factorial, combn
(is/as).(character/numeric/logical/...)
# Lists and data.frames
list, unlist
data.frame, as.data.frame
split
expand.grid
30
# Control flow
if, &&, || (short circuiting)
for, while
next, break
switch
ifelse
# Apply & friends
lapply, sapply, vapply
apply
tapply
replicate
#Common data structures
# Date time
ISOdate, ISOdatetime, strftime, strptime, date
difftime
julian, months, quarters, weekdays
library(lubridate)
# Character manipulation
grep, agrep
gsub
strsplit
chartr
nchar
tolower, toupper
substr
paste
library(stringr)
# Factors
factor, levels, nlevels
reorder, relevel
cut, findInterval
interaction
options(stringsAsFactors = FALSE)
# Array manipulation
array
31
dim
dimnames
aperm
library(abind)
#Statistics
# Ordering and tabulating
duplicated, unique
merge
order, rank, quantile
sort
table, ftable
# Linear models
fitted, predict, resid, rstandard
lm, glm
hat, influence.measures
logLik, df, deviance
formula, ~, I
anova, coef, confint, vcov
contrasts
# Miscellaneous tests
apropos("\\.test$")
# Random variables
(q, p, d, r) * (beta, binom, cauchy, chisq, exp, f, gamma, geom,
hyper, lnorm, logis, multinom, nbinom, norm, pois, signrank, t,
unif, weibull, wilcox, birthday, tukey)
# Matrix algebra
crossprod, tcrossprod
eigen, qr, svd
%*%, %o%, outer
rcond
solve
#Working with R
# Workspace
ls, exists, rm
getwd, setwd
32
q
source
install.packages, library, require
# Help
help, ?
help.search
apropos
RSiteSearch
citation
demo
example
vignette
# Debugging
traceback
browser
recover
options(error = )
stop, warning, message
tryCatch, try
#I/O
# Output
print, cat
message, warning
dput
format
sink, capture.output
# Reading and writing data
data
count.fields
read.csv, write.csv
read.delim, write.delim
read.fwf
readLines, writeLines
readRDS, saveRDS
load, save
library(foreign)
33
# Files and directories
dir
basename, dirname, tools::file_ext
file.path
path.expand, normalizePath
file.choose
file.copy, file.create, file.remove, file.rename, dir.create
file.exists, file.info
tempdir, tempfile
download.file, library(downloader)
6 Appendix 2 : Numerical Types
type mode storage.mode example
logical logical logical TRUE of FALSEinteger numeric integer 4double numeric double 4.0000
complex complex complex [1] 3+5icharacter character character [1] ”word”
raw raw raw The raw type is intended to hold raw bytes
7 Regression
1. Linear Regression
fit <- lm(y ~ x1 + x2 + x3, data=mydata)
summary(fit) # show results
2. Other useful functions
coefficients(fit) # model coefficients
confint(fit, level=0.95) # CIs for model parameters
fitted(fit) # predicted values
residuals(fit) # residuals
anova(fit) # anova table
vcov(fit) # covariance matrix for model parameters
influence(fit) # regression diagnostics
34
# diagnostic plots
layout(matrix(c(1,2,3,4),2,2)) # optional 4 graphs/page
plot(fit)
3. compare models
fit1 <- lm(y ~ x1 + x2 + x3 + x4, data=mydata)
fit2 <- lm(y ~ x1 + x2)
anova(fit1, fit2)
4. K-fold cross-validation
library(DAAG)
cv.lm(df=mydata, fit, m=3) # 3 fold cross-validation
5. Assessing R2 shrinkage using 10-Fold Cross-Validation
fit <- lm(y~x1+x2+x3,data=mydata)
library(bootstrap)
# define functions
theta.fit <- function(x,y){lsfit(x,y)}
theta.predict <- function(fit,x){cbind(1,x)%*%fit$coef}
# matrix of predictors
X <- as.matrix(mydata[c("x1","x2","x3")])
# vector of predicted values
y <- as.matrix(mydata[c("y")])
results <- crossval(X,y,theta.fit,theta.predict,ngroup=10)
cor(y, fit$fitted.values)**2 # raw R2
cor(y,results$cv.fit)**2 # cross-validated R2
6. Stepwise Regression
library(MASS)
fit <- lm(y~x1+x2+x3,data=mydata)
step <- stepAIC(fit, direction="both")
step$anova # display results
35
7. All Subsets Regression
library(leaps)
attach(mydata)
leaps<-regsubsets(y~x1+x2+x3+x4,data=mydata,nbest=10)
# view results
summary(leaps)
# plot a table of models showing variables in each model.
# models are ordered by the selection statistic.
plot(leaps,scale="r2")
# plot statistic by subset size
library(car)
subsets(leaps, statistic="rsq")
8. The relaimpo package provides measures of relative importance for each of the pre-dictors in the model. See help(calc.relimp) for details on the four measures ofrelative importance provided.
9. Graphic Enhancements
The car package offers a wide variety of plots for regression, including added variableplots, and enhanced diagnostic and scatter plots.
10. Robust Regression
There are many functions in R to aid with robust regression. For example, you canperform robust regression with the rlm( ) function in the MASS package. The UCLAStatistical Computing website has robust regression rxamples.
The robust package provides a comprehensive library of robust methods, includingregression. The robustbase package also provides basic robust statistics includingmodel selection methods. And David Olive has provided an detailed online review ofApplied Robust Statistics with sample R code.
11. Glims Generalized linear models are just as easy to fit in R as ordinary linear model.In fact, they require only an additional parameter to specify the variance and linkfunctions.
> glm(formula, family, data, weights, subset, ...)
where ... stands for more esoteric options. The only parameter that we have notencountered before is family, which is a simple way of specifying a choice of varianceand link functions. There are six choices of family:
36
Family Variance Link
gaussian gaussian identitybinomial binomial logit, probit or cloglogpoisson poisson log, identity or sqrtGamma Gamma inverse, identity or log
inverse.gaussian inverse.gaussian 1/mu2
quasi user-defined user-defined
As can be seen, each of the first five choices has an associated variance function (forbinomial the binomial variance m(1-m)), and one or more choices of link functions(for binomial the logit, probit or complementary log-log).
As long as you want the default link, all you have to specify is the family name.If you want an alternative link, you must add a link argument. For example to doprobits you use
> glm( formula, family=binomial(link=probit))
8 Multivariate Data
See http://cran.r-project.org/web/views/Multivariate.html
• Visualising multivariate data
A range of base graphics (e.g. pairs() and coplot()) and lattice functions (e.g.xyplot() and splom()) are useful for visualising pairwise arrays of 2-dimensionalscatterplots, clouds and 3-dimensional densities. scatterplot.matrix in the car
package provides usefully enhanced pairwise scatterplots. Beyond this, scatterplot3dprovides 3 dimensional scatterplots, aplpack provides bagplots and spin3R(), afunction for rotating 3d clouds. misc3d, dependent upon rgl, provides animatedfunctions within R useful for visualising densities. YaleToolkit provides a range ofuseful visualisation techniques for multivariate data.
More specialised multivariate plots include the following:
– faces() in aplpack provides Chernoff’s faces;
– parcoord() from MASS provides parallel coordinate plots;
– stars() in graphics provides a choice of star, radar and cobweb plots respec-tively.
– mstree() in ade4 and spantree() in vegan provide minimum spanning treefunctionality.
37
– calibrate supports biplot and scatterplot axis labelling.
– geometry, which provides an interface to the qhull library, gives indices to therelevant points via convexhulln().
– ellipse draws ellipses for two parameters, and provides plotcorr(), visualdisplay of a correlation matrix.
– denpro provides level set trees for multivariate visualisation.
– Mosaic plots are available via mosaicplot() in graphics and mosaic() in vcd
that also contains other visualization techniques for multivariate categoricaldata.
– gclus provides a number of cluster specific graphical enhancements for scatter-plots and parallel coordinate plots. See the links for a reference to GGobi.
– rggobi interfaces with GGobi.
– xgobi interfaces to the XGobi and XGvis programs which allow linked, dynamicmultivariate plots as well as projection pursuit.
– Finally, iplots allows particularly powerful dynamic interactive graphics, ofwhich interactive parallel co-ordinate plots and mosaic plots may be of greatinterest. Seriation methods are provided by seriation which can reorder matricesand dendrograms.
• Data Preprocessing:
– summarize() and summary.formula() in Hmisc assist with descriptive func-tions; from the same package varclus() offers variable clustering while dataRep()and find.matches() assist in exploring a given dataset in terms of representa-tiveness and finding matches.
– dist() in base and daisy() in cluster provide a wide range of distance mea-sures, proxy provides a framework for more distance measures, including mea-sures between matrices.
– simba provides functions for dealing with presence / absence data includingsimilarity matrices and reshaping.
• Linear models
9 Regression
9.0.2 Model Formulae for ANOVA and regression
R functions such as aov( ), lm( ), and glm( ) use a formula interface to specify thevariables to be included in the analysis. The formula determines the model that willbe built (and tested) by the R procedure. The basic format of such a formula is...
38
response variable ~ explanatory variables
The tilde should be read ”is modeled by” or ”is modeled as a function of.” A basisregression analysis would be formulated this way.
y ~ x
where ”x” is the explanatory variable , and ”y” is the response variable. Additionalexplanatory variables would be added in as follows...
y ~ x + z
which would make this a multiple regression with two predictors. This raises a criticalissue that must be understood to get model formulae correct. Symbols used as math-ematical operators in other contexts do not have their usual mathematical meaninginside model formulae. The following table lists the meaning of these symbols whenused in a formula.
symbol example meaning
+ +x include this variable− −x delete this variable: x : z include the interaction between these variables∗ x ∗ z include these variables and the interactions between them/ x/z nesting: include z nested within x| x|z conditioning: include x given z∧ (u + v + w) ∧ 3 include these variables and all interactions up to three way
poly poly(x,3) polynomial regression: orthogonal polynomialsError Error(a/b) specify the error term
I I(x ∗ z) as is: include a new variable consisting of these variables multiplied1 −1 intercept: delete the intercept (regress through the origin)
Some formula structures can be specified in more than one way...
y ~ u + v + w + u:v + u:w + v:w + u:v:w
y ~ u * v * w
y ~ (u + v + w)^3
All three of these specify a model in which the variables ”u”, ”v”, ”w”, and all theinteractions between them are included. Any of these formats...
39
y ~ u + v + w + u:v + u:w + v:w
y ~ u * v * w - u:v:w
y ~ (u + v + w)^2
would delete the three way interaction.
The nature of the variables–binary, categorial (factors), numerical–will determine thenature of the analysis. For example, if ”u” and ”v” are factors...
y ~ u + v
dictates an analysis of variance (without the interaction term). If ”u” and ”v” arenumerical, the same formula would dictate a multiple regression. If ”u” is numericaland ”v” is a factor, then an analysis of covariance is dictated.
– From stats, lm() (with a matrix specified as the dependent variable) offers mul-tivariate linear models, anova.mlm() provides comparison of multivariate linearmodels. manova() offers MANOVA. sn provides msn.mle() and mst.mle()
which fit multivariate skew normal and multivariate skew t models.
– pls provides partial least squares regression (PLSR) and principal componentregression, ppls provides penalized partial least squares, dr provides dimen-sion reduction regression options such as ”sir” (sliced inverse regression), ”save”(sliced average variance estimation).
– plsgenomics provides partial least squares analyses for genomics. relaimpo
provides functions to investigate the relative importance of regression parame-ters.
– Principal components can be fitted with prcomp() (based on svd(), preferred)as well as princomp() (based on eigen() for compatibility with S-PLUS) fromstats.
– sca provides simple components. pc1() in Hmisc provides the first principalcomponent and gives coefficients for unscaled data.
– Additional support for an assessment of the scree plot can be found in nFactors,whereas paran provides routines for Horn’s evaluation of the number of dimen-sions to retain.
– For wide matrices, gmodels provides fast.prcomp() and fast.svd().
Further options for principal components in an ecological setting are available withinade4 and in a sensory setting in SensoMineR. psy provides a variety of routinesuseful in psychometry, in this context these include sphpca() which maps onto asphere and fpca() where some variables may be considered as dependent as well asscree.plot() which has the option of adding simulation results to help assess the
40
observed data. PTAk provides principal tensor analysis analagous to both PCA andcorrespondence analysis. smatr provides standardised major axed
• Latent variable approaches
– factanal() in stats provides factor analysis by maximum likelihood, Bayesianfactor analysis is provided for Gaussian, ordinal and mixed variables in MCMCpack.
– GPArotation offers GPA (gradient projection algorithm) factor rotation. FAiR
provides factor analysis solved using genetic algorithms.
– sem fits linear structural equation models and ltm provides latent trait modelsunder item response theory and range of extensions to Rasch models can befound in eRm.
– FactoMineR provides a wide range of Factor Analysis methods, including MFA()
and HMFA() for multiple and hierarchical multiple factor analysis as well asADFM() for multiple factor analysis of quantitative and qualitative data.
– tsfa provides factor analysis for time series. poLCAprovides latent class andlatent class regression models for a variety of outcome variables.
41