STATS 330: Lecture 4
Post on 03-Feb-2016
35 Views
Preview:
DESCRIPTION
Transcript
04/22/23 330 Lecture 4 1
STATS 330: Lecture 4
04/22/23 330 Lecture 4 2
HousekeepingMy contact details….
Plus much else on course web page
www.stat.auckland.ac.nz/~lee/330/
Or via Cecil
04/22/23 330 Lecture 4 3
04/22/23 330 Lecture 4 4
Today’s lecture: R for graphics
Aim of the lecture:
To show you how to use R to produce the plots shown in the last few lectures
04/22/23 330 Lecture 4 5
Getting data into R In 330, as in many cases, data comes in 2 main
forms• As a text file• As an Excel spreadsheet
Need to convert from these formats to R Data in R is organized in data frames
• Row by column arrangement of data (as in Excel)
• Variables are columns• Rows are cases (individuals)
04/22/23 330 Lecture 4 6
Text files to R Suppose we have the data in the form of a text
file Edit the text file (use Notepad or similar) so that
• The first row consists of the variable names• Each row of data (i.e. data on a complete case)
corresponds to one line of the file Suppose data fields are separated by spaces
and/or tabs Then, to create a data frame containing the
data, we use the R function read.table
04/22/23 330 Lecture 4 7
Example: the cherry tree data
Suppose we have a text file called cherry.txt (probably created using Notepad or maybe Word, but saved as a text file)
First line: variable names
Data for each tree on a separate line, separated
by “white space” (spaces or tabs)
04/22/23 330 Lecture 4 8
Creating the data frame
In R, type
cherry.df = read.table(file.choose(),
header=TRUE)
and press the return key
This brings up the dialog to select the file cherry.txt
containing the data.
Click here to select file
Click here to load data
04/22/23 330 Lecture 4 9
Check all is OK!
04/22/23 330 Lecture 4 10
Getting data from a spreadsheet (1)
Create the spreadsheet in Excel
Save it as Comma Delimited Text (CSV)
This is a text file with all cells separated by commas
File is called cherry.csv
04/22/23 330 Lecture 4 11
Getting data from a spreadsheet (2)
In R, type
cherry.df = read.table(file.choose(),
header=TRUE, sep=“,”)
and proceed as before
Getting data from the R330 package
The package R330 contains several data sets used in the course, including the cherry tree data
To access the data frame:• Install the R330 package (see Appendix A.10 of the
coursebook)• In R, type
> library(R330)
> data(cherry.df)
04/22/23 330 Lecture 4 12
04/22/23 330 Lecture 4 13
Data frames and variables
Suppose we have read in data and made a data frame
At this point R doesn’t know about the variables in the data frame, so we can’t use e.g. the variable diameter in R commands
We need to say attach(cherry.df)
to make the variables in cherry.df visible to R.
Alternatively, say cherry.df$diameter (better)
04/22/23 330 Lecture 4 14
Scatterplots
In R, there are 2 distinct sets of functions for graphics, one for ordinary graphics, one for trellis.
Eg for scatterplots, we use either plot (ordinary R) or xyplot (Trellis)
In the next few slides, we discuss plot.
04/22/23 330 Lecture 4 15
Simple plottingplot(cherry.df$height,
cherry.df$volume,
xlab=“Height (feet)”,
ylab=“Volume (cubic feet)”,
main = “Volume versus height for 31 black cherry trees”)
i.e. label axes (give units if possible), give a title
04/22/23 330 Lecture 4 16
65 70 75 80 85
10
20
30
40
50
60
70
Volume versus height for 31 black cherry trees
Height (feet)
Vo
lum
e (
cub
ic fe
et)
Alternative form of plotplot(volume ~ height,
xlab=“Height (feet)”,
ylab=“Volume (cubic feet)”,
main = “Volume versus height for 31 black cherry trees”,
data = cherry.df)
Don’t need use the $ notation with this form, note reversal of x,y
04/22/23 330 Lecture 4 17
04/22/23 330 Lecture 4 18
Colours, points, etcpar(bg="darkblue")plot(cherry.df$height, cherry.df$volume, xlab="Height (feet)", ylab="Volume (cubic feet)", main = "Volume versus height for 31 black cherry trees", pch=19,fg="white", col.axis=“lightblue",col.main="white", col.lab=“white",col="white",cex=1.3)
Type
?par
for more info
04/22/23 330 Lecture 4 19
65 70 75 80 85
10
20
30
40
50
60
70
Volume versus height for 31 black cherry trees
Height (feet)
Vo
lum
e (
cub
ic fe
et)
04/22/23 330 Lecture 4 20
Lines Suppose we want to join up the rats on the
rats plot. (see data next slide) We could try
plot(rats.df$day, rats.df$growth, type=“l”)
but this won’t work Points are plotted in order they appear in
the data frame and each point is joined to the next
04/22/23 330 Lecture 4 21
Rats: the data> rats.df growth group rat change day1 240 1 1 1 12 250 1 1 1 83 255 1 1 1 154 260 1 1 1 225 262 1 1 1 296 258 1 1 1 367 266 1 1 2 438 266 1 1 2 449 265 1 1 2 5010 272 1 1 2 5711 278 1 1 2 6412 225 1 2 1 112 230 1 2 1 8
... More data
04/22/23 330 Lecture 4 22
0 10 20 30 40 50 60
30
04
00
50
06
00
day
gro
wth
Don’t want this!
04/22/23 330 Lecture 4 23
SolutionVarious solutions, but one is to plot each line
separately, using subsetting
plot(day,growth,type="n")lines (day[rat==1],growth[rat==1])lines (day[rat==2],growth[rat==2])
and so on …. (boring!), or (better)
for(j in 1:16){lines (day[rat==j],growth[rat==j])}
Draw axes, labels only
04/22/23 330 Lecture 4 24
Indicating groupsWant to plot the litters with different colours, add a legend:
Rats 1-8 are litter 1, 9-12 litter 2, 13-16 litter 3
plot(day,growth,type="n")
for(j in 1:8)lines(day[rat==j],growth[rat==j],col="white") # litter 1
for(j in 9:12)lines (day[rat==j], growth[rat==j],col="yellow") # litter 2
for(j in 13:16)lines (day[rat==j], growth[rat==j],col="purple") # litter 3
Set colour of line
04/22/23 330 Lecture 4 25
legendlegend(13,380,legend = c(“Litter 1”, “Litter 2”,
“Litter 3”), col = c("white","yellow","purple"),lwd = c(2,2,2),horiz = TRUE,cex = 0.7)
(Type ?legend for a full explanation of these parameters)
04/22/23 330 Lecture 4 26
0 10 20 30 40 50 60
30
04
00
50
06
00
day
gro
wth
Litter 1 Litter 2 Litter 3
Points and text
x=1:25
y=1:25
plot(x,y, type="n")
points(x,y,pch=1:25, col="red",
cex=1.2)
04/22/23 27330 Lecture 4
5 10 15 20 25
51
01
52
02
5
x
y
04/22/23 28330 Lecture 4
Points and text (3)
x=1:26
y=1:26
plot(x,y, type="n")
text(x,y, letters, col="blue", cex=1.2)
04/22/23 29330 Lecture 4
0 5 10 15 20 25
05
10
15
20
25
x
y
ab
cd
ef
gh
ij
kl
mn
op
qr
st
uv
wx
yz
04/22/23 30330 Lecture 4
Use of pos
04/22/23 330 Lecture 4 31
x = 1:10y = 1:10plot(x,y)
position = rep(c(2,4), 5)mytext = rep(c(“Left",“Right"), 5)text(x,y,mytext, pos=position)
04/22/23 330 Lecture 4 32
04/22/23 330 Lecture 4 33
Trellis Must load trellis library first
library(lattice)
General form of trellis plots
xyplot(y~x|W*Z, data=some.df)
Don’t need to use the $ form, , trellis functions can pick out the variables, given the data frame
04/22/23 330 Lecture 4 34
Main trellis functions
dotplot for dotplots, use when X is categorical, Y is continuous
bwplot for boxplots, use when X is categorical, Y is continuous
xyplot for scatter plots, use when both x and y are continuous
equal.count use to turn continuous conditioning variable into groups
Changing background colour
To change trellis background to white
trellis.par.set(background = list(col="white"))
To change plotting symbols
trellis.par.set(plot.symbol = list(pch=16, col="red", cex=1))
04/22/23 330 Lecture 4 35
04/22/23 330 Lecture 4 36
Equal.countxyplot(volume~height|diameter, data=cherry.df)
height
volu
me
20
40
60
80
65 70 75 80 85
diameter diameter
65 70 75 80 85
diameter diameter
65 70 75 80 85
diameter diameter
diameter diameter diameter diameter diameter
20
40
60
80diameter
20
40
60
80diameter diameter diameter diameter diameter diameter
diameter diameter diameter diameter diameter
20
40
60
80diameter
20
40
60
80diameter
65 70 75 80 85
diameter diameter
04/22/23 330 Lecture 4 37
Equal.count (2)diam.gp<-equal.count(diameter,number=4,overlap=0) xyplot(volume~height|diam.gp, data=cherry.df)
height
volu
me
10
20
30
40
50
60
65 70 75 80 85
diam.gp diam.gp
diam.gp
65 70 75 80 85
10
20
30
40
50
60diam.gp
Changing plotting symbols
To change plotting symbols
trellis.par.set(plot.symbol = list(pch=16, col="red", cex=1))
04/22/23 330 Lecture 4 38
04/22/23 330 Lecture 4 39
height
volu
me
10
20
30
40
50
60
65 70 75 80 85
diam.gp diam.gp
diam.gp
65 70 75 80 85
10
20
30
40
50
60diam.gp
04/22/23 330 Lecture 4 40
Non-trellis version
1020
3040
5060
70
65 70 75 80 85
65 70 75 80 85 65 70 75 80 85
1020
3040
5060
70
height
volu
me
10 12 14 16 18
Given : diameter
coplot(volume~height|diameter, data=cherry.df)
04/22/23 330 Lecture 4 41
Non-trellis version (2)
coplot(volume~height|diameter,data=cherry.df,number=4,overlap=0)
1030
5070
65 70 75 80 85
65 70 75 80 85
1030
5070
height
volu
me
10 12 14 16 18
Given : diameter
04/22/23 330 Lecture 4 42
Other useful functions
Regular R• scatterplot3d (3d scatter plot, load library
scatterplot3d)• contour, persp (draws contour plots, surfaces)• pairs
Trellis• cloud (3d scatter plot)
Rotating plots You need to install the R330 package
Create a data frame e.g. called data.df with the response in the first column
Then, type
reg3d(data.df)
04/22/23 330 Lecture 4 43
top related