Top Banner
D Kelly O'Day R Script Basics Mod 3 - Scripts Basics: 1 Learn R Toolkit Module 3 R Script Basics Do See & Hear Read Learning Menu PowerPoint must be in View Show Mode to See videos and hyperlinks
32

Module 3 R Script Basics

Jan 22, 2016

Download

Documents

cachet

Do. Learning. See & Hear. Read. Menu. Module 3 R Script Basics. PowerPoint must be in View Show Mode to See videos and hyperlinks. Module 3 R Script Basics Goals. Systematically Start you on your R learning curve Introduce essential functions Demonstrate working R scripts - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Module 3  R Script Basics

D Kelly O'Day R Script Basics Mod 3 - Scripts Basics: 1

Learn R Toolkit

Module 3 R Script Basics

Do

See & Hear Read

Learning

Menu

PowerPoint must be in View Show Mode to See videos and hyperlinks

Page 2: Module 3  R Script Basics

D Kelly O'Day R Script Basics Mod 3 - Scripts Basics: 2

Learn R Toolkit

Module 3 R Script BasicsGoals

• Systematically Start you on your R learning curve • Introduce essential functions• Demonstrate working R scripts• Have you run and edit R scripts through assignments• Provide building blocks for your own scripts

Page 3: Module 3  R Script Basics

D Kelly O'Day R Script Basics Mod 3 - Scripts Basics: 3

Learn R Toolkit

You can get complete R documentation On each function from R ConsoleHelp(“function”) or ?function

For example

>Help(“for”)

>?read.table

Essential Tasks and FunctionsCovered in This Module

Page 4: Module 3  R Script Basics

D Kelly O'Day R Script Basics Mod 3 - Scripts Basics: 4

Learn R Toolkit

Module 3 R Scripts Menu

1. R’s File Path & Name Conventions

2. How to Read a Data File

3. Working with R Scripts

4. Working with Vectors & data.frames

5. Working with Dates

6. Subsetting & Factors

Press Hyperlinks to go to topic slide, Press Video Box to Start Video

Page 5: Module 3  R Script Basics

D Kelly O'Day R Script Basics Mod 3 - Scripts Basics: 5

Learn R Toolkit

Video 3-1: R’s File Path

Click video image to start video

Page 6: Module 3  R Script Basics

D Kelly O'Day R Script Basics Mod 3 - Scripts Basics: 6

Learn R Toolkit

R’s File Path and Name Conventions

Get file path interactively

• R’s choose.files() function – Brings up Select File Window– Lets users interactively select data file

• Copy/paste the correctly formatted file name to your script

Menu

Issue

• Windows & R handle forward / &

backward \ slashes differently

• Windows path:

“C:\Learn_R\Mod_3_R_Scripts”

• R considers \ as an escape character

• Need to adjust to R’s path conventions

• R’s 2 Valid paths formats: “C:/Learn_R/Mod_3_R_Scripts” or “C:\\Learn_R\\Mod_3_R_Scripts”

• R lets you use / or \\, not \

Page 7: Module 3  R Script Basics

D Kelly O'Day R Script Basics Mod 3 - Scripts Basics: 7

Learn R Toolkit

Assignment 3-1choose.files()

• Start R Session

• In R Console, Open Script:“C:/Learn_R/Mod_3_R_Script/Ex_Scr_3_1_choose_file.R”

• Save Script as: “C:/Learn_R_Mod_3_R_Script/Practice_3_1_choose.R

• Edit Script to Read data file:"C:\\Learn_R\\Mod_3_R_Script_Basics\\Data_3_1_GISS_1980_By_year.txt"

Expected Result

Page 8: Module 3  R Script Basics

D Kelly O'Day R Script Basics Mod 3 - Scripts Basics: 8

Learn R Toolkit

Video 3-2How to read a data file

Click video image to start video

Page 9: Module 3  R Script Basics

D Kelly O'Day R Script Basics Mod 3 - Scripts Basics: 9

Learn R Toolkit

How to Read Data File (Txt, CSV, Web Based)

read.table()• May be single most frequent

function you will use

• Goal is to read data from source file, assign to data.frame

• Web Based Files- Simply specify “url” rather than path

– Link <- “http://…..

Menu

Tip:1. Use Notepad to look at data file2. Print out first few lines of file3. Use printout to answer ??

#################### Example R Script: ############# ##Ex_Scr_3_2_read_file.R ############################Script to read data file, list contents## STEP 1: SETUP - Source File rm(list=ls()) link my_data <- read.table(link, skip =?, sep = "?", dec=".", row.names = NULL , header = ?, colClasses = c("??","??"), comment.char = "#", na.strings = c( "","*", "-",-99.9, -999.9 ), col.names = c( "?? ", "??") )my_data

Page 10: Module 3  R Script Basics

D Kelly O'Day R Script Basics Mod 3 - Scripts Basics: 10

Learn R Toolkit

Assignment 3-2Read Data File

• Start R Session

• In R Console, Open Script:“C:/Learn_R/Mod_3_R_Script/Ex_Scr_3_2_read_file.R”

• Save Script as: “C:/Learn_R_mod-3_R_Script/Practice_3_2_read.R

• Edit Script to Read data file:"C:\\Learn_R\\Mod_3_R_Script_Basics\\Data_3_1_GISS_1980_by_year.txt"

## Practice_3_2_read_file.R ################### ##Script to read data file, list contents## STEP 1: SETUP - Source File rm(list=ls()) link <- choose.files()my_data <- read.table(link, skip =0, sep = ",", dec=".", row.names = NULL , header = F, colClasses = c("numeric","numeric"), comment.char = "#", na.strings = c( "","*", "-",-99.9, -999.9 ), col.names = c( "yr", "anom") )my_data

Expected Result

Page 11: Module 3  R Script Basics

D Kelly O'Day R Script Basics Mod 3 - Scripts Basics: 11

Learn R Toolkit

Video 3-3: Working with R Scripts

Click video image to start video

Page 12: Module 3  R Script Basics

D Kelly O'Day R Script Basics Mod 3 - Scripts Basics: 12

Learn R Toolkit

Working with R Scripts

• Let’s look at the simple, structured R script on the right

• Many of our R scripts will handle similar sets of tasks:1. Define source data file2. Read data & assign to data.frame3. Manipulate Data 4. Produce Charts/ Reports

• Things to Notice– Extensive comments (#s)– Delineation of Steps– Uses several arguments in read.table()

and plot function() – Indentation of arguments

• This script can be edited and reused for similar tasks

## Ex_Scr_3_3_work_w_scripts.R ################### ##Script to read file, produce plot## STEP 1: SETUP & SOURCE FILE rm(list=ls()); par(las=1) link <- choose.files()## STEP 2: READ DATA my_data <- read.table(link, sep = ",", dec=".", skip = 0, row.names = NULL, header = T, colClasses = c("numeric", "numeric" ), na.strings = c("", "*", "-", -99.99,99.9, 999.9), col.names = c(“Var1", “Var2")) ## STEP 3: MANIPULATE DATA Title <- "Ex_scr_3_3.R Example Output\n Description of Data Set" ## STEP 4: CREATE PLOT plot(Var2 ~ Var1, data = my_data, type = "l", col = "red", main = Title)

Page 13: Module 3  R Script Basics

D Kelly O'Day R Script Basics Mod 3 - Scripts Basics: 13

Learn R Toolkit

Working with R Scripts

Menu

## Ex_Scr_3_3_work_w_scripts.R ################### ##Script to read 2 variable data file, produce XY plot## STEP 1: SETUP & SOURCE FILE rm(list=ls()); par(las=1) link <- choose.files()## STEP 2: READ DATA my_data <- read.table(link, sep = ",", dec=".", skip = 0, row.names = NULL, header = T, colClasses = c("numeric", "numeric" ), na.strings = c("", "*", "-", -99.99,99.9, 999.9), col.names = c(“Var1", “Var2")) ## STEP 3: MANIPULATE DATA Title <- "Ex_scr_3_3.R Example Output\n Description of Data Set" ## STEP 4: CREATE PLOT plot(Var2 ~ Var1, data = my_data, type = "l", col = "red", main = Title)

Page 14: Module 3  R Script Basics

D Kelly O'Day R Script Basics Mod 3 - Scripts Basics: 14

Learn R Toolkit

Assignment 3-3 Edit R Script

• Go to Desktop

• Press R shortcut

• In R GUI, File > Open, Select c:\Learn_R\Mod_3_R_Script_Basics\Ex_Scr_3_3_work_w_scripts.R

• Save as: C:\Learn_R\Mod_3_R_Script_Basics\Practice_3_3_work_w_script.R

• Edit Practice_3_3_work_w_script.R file– Edit Comment at top – Edit col.names: c(“yr_frac”, “CO2”)– Edit Title Line 2: Monthly CO2 (ppmv) Mauna

Loa,Hawaii"

– Change col = “blue”– Save changes to Practice_3_3_…– Control A + Control R to run

Menu

Source Data File"C:\\Learn_R\\Mod_3_R_Script_Basics\\Data_3_2_CO2_by_month.txt"

## Ex_Scr_3_3_work_w_scripts.R ################### ##Script to read file, produce plot## STEP 1: SETUP & SOURCE FILE

par(las=1)link <- choose.files()

## STEP 2: READ DATA my_data <- read.table(link, sep = ",", dec=".", skip = 0, row.names = NULL, header = T, colClasses = c("numeric", "numeric" ), na.strings = c("", "*", "-", -99.99,99.9, 999.9), col.names = c(“Var1", “Var2”)) ## STEP 3: MANIPULATE DATA Title <- "Ex_scr_3_3.R Example Output\n Description of Data Set" ## STEP 4: CREATE PLOT plot(Var2 ~ Var1, data = my_data, type = "l", col = “red", main = Title)

Page 15: Module 3  R Script Basics

D Kelly O'Day R Script Basics Mod 3 - Scripts Basics: 15

Learn R Toolkit

Assignment 3-3Expected Result

Menu

Practice_3_3_work_w_scripts.R script

#################### Example R Script ############ Practice_3_3_work_w_scripts.R ############### ##Script to read file, produce plot## STEP 1: SETUP - Source File par(las=1) link <- choose.files()## STEP 2: READ DATA my_data <- read.table(link, sep = ",", dec=".", skip = 0, row.names = NULL, header = T, colClasses = c("numeric", "numeric" ), na.strings = c("", "*", "-", -99.99,99.9, 999.9), col.names = c("Yr_frac", "CO2")) ## STEP 3: MANIPULATE DATA Title <- “Practice_3_3_work_w_scripts.R Example Output\n Monthly CO2 (ppmv) Mauna Loa,Hawaii" ## STEP 4: CREATE PLOT plot(CO2 ~ Yr_frac, data = my_data, type = "l", col = "blue", main = Title)

Page 16: Module 3  R Script Basics

D Kelly O'Day R Script Basics Mod 3 - Scripts Basics: 16

Learn R Toolkit

Video 3-4 Vectors and data.framesClick video image to start video

Page 17: Module 3  R Script Basics

D Kelly O'Day R Script Basics Mod 3 - Scripts Basics: 17

Learn R Toolkit

Vectors and data.frames– What you need to know

• Vector Data Types– Numeric (2.67)

– Character (“John Smith”)

– Logical (“T”)

– Factor (“Male”)

• All items in vector must be same data type

• R will coerce all vector items to single type

• Vector Names– data.frame[column number]

– data.frame$col.name

• data.frame & vector indexes [ ]df[c] - column number in data.framedf[r,c] - row & column in data.framev[r] - row number in vector

• Calculated variables are vectors

• Vectors are dynamic

• Number of rows in data.frame must be the same for each vector

• nrow() function counts number of data rows in data.frame

• length() function counts number of items in vector

Menu

Page 18: Module 3  R Script Basics

D Kelly O'Day R Script Basics Mod 3 - Scripts Basics: 18

Learn R Toolkit

Functions that Create Vectors

c(), seq(), rep()

• 3 Ways to Enter vector items:

– Itemizevar_type <- c(“character”, “numeric”, “numeric”,

“logical”, “numeric”, “numeric”)

or

– Combine c() & rep()var_type < c(“character”, rep(“numeric”, 2),

“logical”, rep(“numeric”,2)

or

– Combine c() & seq()

x <- c(seq(1,10,2), 11,14,18,19)

Menu

seq() – “sequence” Function – uniformly spaced seriesmy_numbers <- seq(a,b, inc)

a – start valueb – end valueinc – increment; 1 is default

num <- seq(3,17,2) # (3,5,7,9,11,13,15,17) uniform <- 5:9 #(5,6,7,8,9)

rep() – “replicate” Functionmy_repeat_num <- rep(q, n)

q – number or character to be replicatedn – number of replications

my_rep<- rep(“abc”, 3) # (“abc”,“abc”,“abc”)

c() – “combine” Function

my_animals <- c(“dog”, “cat”, rabbit”)

my_num <- c(1,8,11.2, 13,6, 19.13)

In addition to read.table() function

Page 19: Module 3  R Script Basics

D Kelly O'Day R Script Basics Mod 3 - Scripts Basics: 19

Learn R Toolkit

How to Make Basic Vector Calculations: sum(), max(), min(), mean(), median()

Example> r<- rnorm(10,100,5) # creates vector with 10 random nos, mean =100, sd = 5> r_mean <- mean(r) # calculate mean of vector r> r_mean # output r_men to console[1] 98.16317

Menu

# If you may have missing values, use na.rm = T

max(vector, na.rm = T)min(vector, na.rm = T) # must remove na's to get valid answermean(vector, na.rm = T)median(vector, na.rm = T)sum(vector, na.rm = T)summary() prints quartiles, mean, min, maxsummary(data.frame) prints summary for each column quantiles(x, 0.9) finds 90th percentilernorm(n1, m, d) generates n1 random numbers, mean m & sd - d

Page 20: Module 3  R Script Basics

D Kelly O'Day R Script Basics Mod 3 - Scripts Basics: 20

Learn R Toolkit

which() returns rows with specific value in vector

which( x = ??)

Returns index for row(s) of vector x that meet criteria ??

# Find index of max() value

vals <- c(1,3,2,68,11,13,19,8,49,4)

my_max <- max(vals)

which_val <- which(vals == my_max)

cat(c("Max = ", my_max, "val #", which_val), fill = 30)

Menu

Page 21: Module 3  R Script Basics

D Kelly O'Day R Script Basics Mod 3 - Scripts Basics: 21

Learn R Toolkit

attach() data.frame

• For vectors in a data.frame must include data.frame name

– data.frame$col.name

or

– data.frame[column number]

or

• attach(data.frame) function adds data.frame to R search path

– Vectors in data.frame can be accessed by name

– Saves having to use data.frame$ before vector name

• detach(data.frame) good idea to remove from workspace when done

Menu

Page 22: Module 3  R Script Basics

D Kelly O'Day R Script Basics Mod 3 - Scripts Basics: 22

Learn R Toolkit

Assignment 3-4Working with a vector

• Start with New Script File

• Save as:C:\Learn_R\Mod_3_R_Script_Basics\Practice_3_4_vectors.R

• Create vals vectorc(1,3,5,7,21,4,12.2,19.12,21)

• Make these calculations

summary(vals)

length(vals)

mean(vals)

which(vals==max(vals))

Expected Result

Menu

Page 23: Module 3  R Script Basics

D Kelly O'Day R Script Basics Mod 3 - Scripts Basics: 23

Learn R Toolkit

Video 3-5Working with Dates

Click video image to start video

Menu

Page 24: Module 3  R Script Basics

D Kelly O'Day R Script Basics Mod 3 - Scripts Basics: 24

Learn R Toolkit

Working with DatesWhat You Need to Know

• R,like Excel, treats dates in a special way!!

• R dates start Jan. 1, 1970

– Before 1/1/70 negative

– After 1/1/70 positive

• Read dates as “character” vector

• Use as.Date() to convert to date vector

original as.Date(original, “ “ )

1/13/07 “%m/%d/%y”

Jan 13, 2007 “%b %d, %Y”

2007- Jan - 1 “%Y-%b - %d”

January 13, 2007 “%B %d, %Y”

Be sure to specify any delimiters in dates

/ , - *

• Input dates must include d-m-year in any order

• as. Date (char_v, “%m/%d/%y”) specifies how dates are organized

– %d - day of month (1-31)– %m - month number (1-12)– %b - month abrev (Jan)– %B - full month name (January)– %y - 2 digit year (08)– %Y - 4 digit year (2008)

my_date <- as.Date(char_v, “%m/%d/%y”)

Menu

Page 25: Module 3  R Script Basics

D Kelly O'Day R Script Basics Mod 3 - Scripts Basics: 25

Learn R Toolkit

Reading Date Character DataConverting to R Dates

##Script to Demonstrate character date input & conversion to R date

## STEP 1: SETUP - Source File

link <- C:\\Learn_R\\Mod_3_R_Script_Basics\\Data_3_3_GISS_by_month.txt”

## STEP 2: READ DATA my_data <- read.table(link, sep = ",", dec=".", skip = 1, row.names = NULL, header = F, colClasses = c("character", "numeric","factor"), na.strings = c("", "*", "-", -99.99,99.9, 999.9), col.names = c("char_date", "T_anom", "Enso_f"))

## STEP 3:Convert character dates to R dates, then get month valuesr_date <- as.Date(my_data$char_date, "%m/%d/%Y")r_mo <- months(r_date)

## STEP 4: New data.frame - add r_date & r_mo vectorsmy_data_1 <- data.frame(my_data, r_date, r_mo)attach(my_data_1)

head(my_data_1)

Menu

Data File Example

Dates are character strings

Page 26: Module 3  R Script Basics

D Kelly O'Day R Script Basics Mod 3 - Scripts Basics: 26

Learn R Toolkit

Assignment 3-5Working with Dates

• Start R Session

• In R Console, Open Script:“C:/Learn_R/Mod_3_R_Script/Ex_Scr_3_5_Date_conv.R”

• Run Script to Read data file:"C:\\Learn_R\\Mod_3_R_Script_Basics\\Data_3_3_GISS_by_month.txt“

• Things to Notice – Creation of new data.frame

– Use of attach() function

– Use of as.Date()

– Use of head()

Menu

Printout & Read R Documentation foras.Date() & months()

? as.Date? months

Page 27: Module 3  R Script Basics

D Kelly O'Day R Script Basics Mod 3 - Scripts Basics: 27

Learn R Toolkit

Video 3-6Functions to Subset & Summarize Data

Click video image to start video

Menu

Page 28: Module 3  R Script Basics

D Kelly O'Day R Script Basics Mod 3 - Scripts Basics: 28

Learn R Toolkit

subset() Subset Data and Calculate Summary Values

• R lets you quickly define subsets of data and calculate summary statistics for the subset

• Goal: calculate average temperature anom for 1930s decade

subset() function

dec_subset<- subset(df, vector = =?)• Approach:

• Calculate decade for each row

• Subset rows with decade = 1930

• Calculate average for subset

which_decade <- 1930decade <- as.integer(my_data$yr/10)*10my_data <- data.frame(my_data, decade)attach(my_data)decade_subset <- subset(my_data, decade== which_decade)decade_avg <- mean(decade_subset$anom)cbind(which_decade, decade_avg)

Menu

Data File Example

Page 29: Module 3  R Script Basics

D Kelly O'Day R Script Basics Mod 3 - Scripts Basics: 29

Learn R Toolkit

for(i in a:b) { }How to use for loop & subset()

• What if we want average for each decade?

•Combine for loop & subset() for (i in a:b) { subset(df, vector ==? )}

•Approach: •Calculate decade for each row

•Subset rows by decade

•Calculate average for each decade subset

Data File

## STEP 3: CALC DECADE MEANSdecade <- as.integer(my_data$yr/10)*10my_data <- data.frame(my_data, decade)

attach(my_data)dec_list <- seq(1880, 2000, 10)num_dec <- length(dec_list)dec_subset<- 1dec_avg<- 1for(i in 1:num_dec) { dec_subset <- subset(my_data, decade == dec_list[i])dec_avg[i] <- mean(dec_subset$anom, na.rm=T) }cbind(dec_list, dec_avg)

Page 30: Module 3  R Script Basics

D Kelly O'Day R Script Basics Mod 3 - Scripts Basics: 30

Learn R Toolkit

tapply() How to Summarize Data by Factor

1. Create decade_f factor as.factor(decade)2. Summarize by decade_f

tapply( x, INDEX, FUN)Applies FUNction (mean, max, etc) to each cell in x for each level of factor INDEX

Menu

Another way to get average for all decades?

Data File

## STEP 3: CALC DECADE MEANSdecade <- as.integer(my_data$yr/10)*10decade_f <- as.factor(decade)my_data <- data.frame(my_data, decade_f)

attach(my_data)dec_avg <- tapply(anom, INDEX = decade_f, mean)cbind(dec_avg)

Page 31: Module 3  R Script Basics

D Kelly O'Day R Script Basics Mod 3 - Scripts Basics: 31

Learn R Toolkit

Assignment 3-6subset() & mean()

• Start R Session

• In R Console, Open Script:“C:/Learn_R/Mod_3_R_Script/Ex_Scr_3_6_subset_data_mean.R”

• Run Script to Read data file:"C:\\Learn_R\\Mod_3_R_Script_Basics\\Data_3_4_GISS_By_year.csv"

• Edit which_decade to 1940 & Rerun script

Expected Result

Printout & Read R Documentation for

subset() & mean()

? subset? mean

Menu

Page 32: Module 3  R Script Basics

D Kelly O'Day R Script Basics Mod 3 - Scripts Basics: 32

Learn R Toolkit

Assignment 3-7as.factor() & tapply()

• Start R Session

• In R Console, Open Script:“C:/Learn_R/Mod_3_R_Script/Ex_Scr_3_8_factor_tapply.R”

• Run Script to Read data file:"C:\\Learn_R\\Mod_3_R_Script_Basics\\Data_3_4_GISS_By_year.csv“

• Things to Notice – Creation of new data.frame

– Use of attach() function

– Use of as.factor()

– Use of tapply()

Printout & Read R Documentation foras.factor() & tapply()

? factor? tapply

Menu