1 R Commander Tutorial Introduction R is a powerful, freely available software package that allows analyzing and graphing data. However, for somebody who does not frequently use statistical software packages, the big drawback of R is that it is command line based and thus not very intuitive to use. For users who do not use statistical software very often, R commander might be a good alternative. The R commander is a software package that allows running R from a graphical user interface. This makes analyzing and graphing your data in R a lot easier. Objective The objective of this tutorial is to give you a basic introduction to R Commander and how to use it to run basic statistics and create graphs. 1. Start the R Commander Open R by either clicking on the R icon on your desktop or by navigating to R in your programs folder. Once you opened R, go to Packages/Load Packages … on the R menu bar and find Rcmdr in the R packages list (R packages are similar to software programs that have been written by different contributors for R). Highlight Rcmdr by clicking on it and click OK. R might give you a warning message. If so, just ignore it and click No. The R Commander console should now appear on your screen and you are ready to run some statistics and make some graphs in R.
16
Embed
R Commander Tutorial - ecosensing.orgecosensing.org/wp-content/uploads/2012/04/R-commander-tutorial1.pdf · 1 R Commander Tutorial Introduction R is a powerful, freely available software
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
1
R Commander Tutorial
Introduction
R is a powerful, freely available software package that allows analyzing and graphing data. However, for
somebody who does not frequently use statistical software packages, the big drawback of R is that it is
command line based and thus not very intuitive to use. For users who do not use statistical software
very often, R commander might be a good alternative. The R commander is a software package that
allows running R from a graphical user interface. This makes analyzing and graphing your data in R a lot
easier.
Objective
The objective of this tutorial is to give you a basic introduction to R Commander and how to use it to run
basic statistics and create graphs.
1. Start the R Commander
Open R by either clicking on the R icon on your desktop or by navigating to R in your programs folder.
Once you opened R, go to Packages/Load Packages … on the R menu bar and find Rcmdr in the R
packages list (R packages are similar to software programs that have been written by different
contributors for R). Highlight Rcmdr by clicking on it and click OK.
R might give you a warning message. If so, just ignore it and click No. The R Commander console should
now appear on your screen and you are ready to run some statistics and make some graphs in R.
2
2. Reading your data into R
After you come back from the field, your notebook shows the following data recordings:
Now you want to create a digital copy of your data. To do this, start your computer and type the data
table into notepad or another text editor of your choice and save the data table on your hard drive
(Important: Data have to be separated by commas as shown below). Make sure you remember where
you save it so you can navigate to the dataset later on.
3
On the R Commander menu bar, go to Data/Import data and select from text file, clipboard, or URL …
which should bring up the window below. Make the same selections as shown in the window below (e.g.
name your data set cover_moisture and select Commas as your field separator since we separated our
data by commas when we entered them into our text editor earlier).
Click OK and a window appears that allows you to navigate to your data file. Once you navigated to your
data file, highlight it by clicking on it and click Open. You can now view your data by clicking on View
data set on the R Commander menu bar.
You can also directly enter your data into R by selecting Data from the R Commander menu bar and
clicking on New dataset ….This will bring up the following window.
The Data Editor window appears that allows you to directly enter your data into R. By clicking on the
column header, you can change the variable name of each column (e.g. change var1 to location, var2 to
4
cover, and var3 to soil moisture). The variable editor also allows you to select the type of your variables
you are entering. Since you are entering numeric values, select numeric under variable type. Type in
your data as shown below.
3. Summary statistics
To get some summary statistics of your data, go to Statistics/Summaries and select Numerical
summaries …. Now you should see the following window:
Pick cover and soil.moisture (Note: to select more than one variable you have to hold down the Ctrl key)
and click OK. A summary table will appear that shows the mean, standard deviation, and the 0, 0.25,
0.50, 0.75, 1 quantiles of the cover and soil.moisture data.
4. Scatterplot
To see if there is a relationship between cover and soil moisture it might be a good idea to look at a
scatterplot of the data. To create a scatterplot, go to Graphs on the R Commander menu bar and select
Scatterplot …. This will bring up a table. Select cover as you x-variable and soil moisture as your y-
variable. Lable your x- and y-axis Cover (%) and Soil Moisture %, respectively. Next, click OK and a
scatterplot will appear (Important: Make sure you highlight the R Console by clicking on it to be able to
see the scatterplot). You can save the scatterplot (or any other plot you create) by clicking on the plot
5
(Important: if you do not select the plot you won’t be able to save it) and on the R menu bar (Note: R
menu bar and not the R commander menu bar) going to File/Save as/Jpeg and click on 100% quality … .
This will bring up a window that allows you to specify the location on your computer where you want to
save the plot as a Jpeg image.
5. Fitting a linear regression model
The scatterplot above shows us that there is a positive relationship between soil moisture and cover.
However, the scatterplot does not tell us how strong the relationship is, if the relationship is significant
etc. To get this information we do have to fit a linear regression model. To fit a linear regression model
go to Statistics/Fit models on the R Commander menu bar and select Linear model … . Select soil
moisture as your response variable (aka y- variable or dependent variable) and cover as your
explanatory variable (aka x-variable or independent variable) and click OK.
6
The following output will appear in the Output Window of the R Commander:
We will talk in class how to interpret the output table (e.g. what do those numbers mean).To check the
basic model diagnostics for the linear model you just fit, go to Models/Graphs on the R Commander
menu bar and select Basic diagnostic plots. This brings up the following window (We will discuss in class
how to interpret the model diagnostic plot):
7
6. Fitting multiple regression models
In this part of the tutorial you learn how to fit a multiple regression model. Your hypothesis is that air
temperature, solar radiation, and wind speed are significant predictors of ozone. To test this hypothesis,
you collected the data called “airquality.txt” that are available in the class Dropbox folder
(C:\...\Dropbox\Jan Teaching Files\CSS 560\Data\R Commander\airquality.txt) (Note: The data was
taken from Daalgard, 2002). Let's import the data into R commander and call the dataset airquality (if
you can't remember how to import data please refer to x.x in the document). Let's take a look at the
data to familiarize ourselves with the data by selecting airquality from the Data set dropdown menu.
Next, let's plot the relationships between the different variables in the dataset. To do this, make the R
Console active by clicking on it and type the following command into the R Console command line
prompt: pairs(airqualit).
8
Now you should see the following figure:
This is how you read the figure:
It looks like there is some sort of relationship between ozone and temperature and ozone and wind.
However, there seems to be no relationship between ozone and solar radiation.
OK - let's now fit a multiple regression model to test if solar radiation, wind, and temperature are
significant predictors of ozone. To fit a multiple regression model let's go to Statistics/Fit models... on
9
the R Commander menu bar and select Linear model... . A window appears that should be somewhat
familiar to you from section 5 of this tutorial. The model you want to fit basically says that ozone is a
function of solar radiation, air temperature, and wind. Mathematically, we can write this model as
follows:
Ozone ~ Solar.R + Temp + Wind [1]
After typing model [1] in the appropriate section of the linear model window (see above) click OK. You
should now see the following output:
10
Let's also take a look at the model diagnostics:
We will discuss the interpretation of the model output as well the interpretation of the model
diagnostics in more detail in class.
7. Paired t-test
Next, we will to conduct a paired t-test to see if there is a statistical significant difference in soil moisture
before and after a rain event. The data for the paired t-test is in the class Dropbox folder
(C:\Users\Jan\Dropbox\Jan Teaching Files\CSS 560\Data\R Commander\paired _t_test.txt). Import the
data into R by following the steps you learned about at the beginning of this tutorial and name the
dataset soil_moisture (Hint: Open the paired_t_test.txt file in a text editor. You will see that the
paired_t_test.txt file is a tab delimited file and not comma delimited file. You need that information to
properly import the data into R).
Before conducting a paired t-test (and any other t-test) it might be a good idea to look at a
boxplot of the data first. To do this you do have to stack your data first (you just re-arranging the data so
they are in a format that can be used by the computer to create a boxplot of your data) by going to
Data/Active data set on the R Commander menu bar and click on Stack variables in active data set … .
11
You should now see the Stack Variables window shown below. Select both the soil.moisture.after and
soil.moisture.before variables and name the stacked dataset stacked_soil_moisture. Keep the rest of the
default settings as shown below and click OK.
Next, go to Graphs/Boxplots… on the R Commander menu bar. In the window that pops up select Plot by
groups… and group your variables by factor and click OK. Now you should see the following boxplot:
Based on the boxplot, do you think the soil moisture changed significantly after the rain event?
After visually looking at the data we are ready to run a paired t-test. To do this, let’s go back to our
original, unstacked dataset by going to Data set on the R Commander menu bar and selecting
soil_moisture. Click OK.
12
Next, go to Statistics/Means on the R Commander menu bar and select Paired t-test … .
Next, select soil.moisture.before as your first variable and soil.moisture.after as you second variable.
Keep the rest at the default settings as shown below.
After clicking OK you should get the following output. We will discuss in class how to interpret the
output.
8. Two-sample t-test
In this section of the tutorial we will learn how to conduct a two sample t-test. We want to test the
following hypothesis: soil pH of the non treated stand in the Ponderosa State Park is statistically
13
significantly different than the soil pH in the treated part of the Park. The hypothetical data that were
collected are available in the class Dropbox folder (C:\...\Dropbox\Jan Teaching Files\CSS 560\Data\R
Commander\ph.txt).
Let's import the data into the R commander and create a boxplot of the data as we learned in section 7
of this tutorial (remember: you first have to stack the data in order to create the boxplot below. For
more details please refer to section 7 of this tutorial).
OK - it looks like the soil pH in the non treated part of the forest is lower than in the treated part. Let's
now do a two-sample t-test to see if the soil pH are statistically significantly different from each other.
To do this, keep your stacked pH dataset active and go to the R Commander menu bar and select
Statistics/Means and select Independent samples t-test... (in case Independent samples t-test... option is
greyed out make sure you i) stacked the pH dataset and ii) that the stacked pH dataset is the active
dataset) .
14
The window that now appears should look similar to the one below:
Keep the default settings and click OK. Now you should see the following output:
We will discuss in the class how to interpret the output.
9. Customize your graphs
If you want to customize your figures, you do have to do a little bit of programming. For example, the
boxplot you creaed in section 8 of this tutorial is associated with the following line of code in your R
Commander script window:
boxplot(variable ~ factor, ylab = "pH", xlab="factor", data = pH_stacked)
15
We can now change this line of code some to make the boxplot a little nicer. For example, we could type