R Basics / Course Business l We’ll be using a sample dataset in class today: l CourseWeb: Course Documents à Sample Data à Week 2 l Can download to your computer before class l Thanks for answering CourseWeb background survey! l If sitting in on the course, e-mail me so I can add you to CourseWeb
71
Embed
R Basics v10 - Learning Research and Development Center · 2019. 1. 9. · R Basics lR commands & functions lReading in data lSaving R scripts lDescriptive statistics lSubsettingdata
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
R Basics / Course Businessl We’ll be using a sample dataset in class today:
l CourseWeb: Course Documents à Sample Data à Week 2
l Can download to your computer before class
l Thanks for answering CourseWeb background survey!
l If sitting in on the course, e-mail me so I can add you to CourseWeb
R Basics
R Basics
R Basics
l R commands & functionsl Reading in datal Saving R scriptsl Descriptive statisticsl Subsetting datal Assigning new valuesl Referring to specific cellsl Types & type conversionl NA valuesl Getting help
R Commands
l Simplest way to interact with R is by typing in commands at the > prompt:
R STUDIO R
R as a Calculator
l Typing in a simple calculation shows us the result:l 608 + 28
l More complex calculations can be done with functions:l sqrt(64)
l Can often read theseleft to right (“square root of 64”)
l What do you thinkthis means?l abs(-7)
What the function is (square root)
In parenthesis: What we want to perform the function on
Arguments
l Some functions have settings(“arguments”) that we can adjust:
l round(3.14)- Rounds off to the nearest integer (zero
decimal places)
l round(3.14, digits=1)- One decimal place
Nested Functions
Nested Functions
l We can use multiple functions in a row, one inside another- sqrt(abs(-16))- “Square root of the absolute value of -16”
l Don't get scared when you see multiple parentheses!- Can often just read left to right- R first figures out the thing nested in
the middle• Can you round off the square root of 7?
Using Multiple Numbers at Once
l When we want to use multiple numbers, we concatenate them
l c(2,6,16)- A list of the numbers 2, 6, and 16
l Sometimes a computation requires multiple numbers- mean(c(2,6,16))
l Also a quick way to do the same thing to multiple different numbers:- sqrt(c(16,100,144))
R Basics
l R commands & functionsl Reading in datal Saving R scriptsl Descriptive statisticsl Subsetting datal Assigning new valuesl Referring to specific cellsl Types & type conversionl NA valuesl Getting help
Course Documents: Sample Data: Week 2
l Reading plausible versus implausible sentences
l “Scott chopped the carrots with a knife.”
“Scott chopped the carrots with a spoon.”
Measure reading time on final word
Note: Simulated data; not a real experiment.
Course Documents: Sample Data: Week 2
l Reading plausible versus implausible sentences
l Reading time on critical word
l 36 subjectsl Each subject sees 30 items (sentences):
half plausible, half implausiblel Interested in changes over time, so we’ll
track number of trials remaining (29 vs 28 vs 27 vs 26…)
Reading in Data
l Make sure you have the dataset at this point if you want to follow along:
Course Documents àSample Data à
Week 2
Reading in Data – RStudio
l Navigate to thefolder in lower-right
l More ->Set as Working Directory
l Open a “comma-separated value” file:- experiment <-read.csv('week2.csv')
Name of the “dataframe” we’re creating (whatever we want to call this dataset) read.csv is the
function nameFile name
Reading in Data – Regular Rl Read in a “comma-separated value” file:
l R commands & functionsl Reading in datal Saving R scriptsl Descriptive statisticsl Subsetting datal Assigning new valuesl Referring to specific cellsl Types & type conversionl NA valuesl Getting help
R Scriptsl Save & reuse commands with a script
R STUDIO
RFile -> New Document
R Scripts
l Run commands without typing them all again
l R Studio:l Code -> Run Region -> Run All: Run entire scriptl Code -> Run Line(s): Run just what you’ve
highlighted/selectedl R:- Highlight the section of script you want to run- Edit -> Execute
l Keyboard shortcut for this:- Ctrl+Enter (PC), ⌘+Enter (Mac)
R Scripts
l Saves times when re-running analyses
l Other advantages?
l Some:- Documentation for yourself- Documentation for others- Reuse with new analyses/experiments- Quicker to run—can automatically
perform one analysis after another
R Scripts—Comments
l Add # before a line to make it a comment- Not commands to R, just notes to self
(or other readers)
• Can also add a # to make the rest of a line a comment
• summary(experiment$Subject) #awesome
R Basics
l R commands & functionsl Reading in datal Saving R scriptsl Descriptive statisticsl Subsetting datal Assigning new valuesl Referring to specific cellsl Types & type conversionl NA valuesl Getting help
Descriptive Statistics
l Remember how we referred to a particular variable in a dataframe?- $
l Combine that with functions:- mean(experiment$RT)- median(experiment$RT)- sd(experiment$RT)
l Or, for a categorical variable:- levels(experiment$ItemName)- summary(experiment$Subject)
Descriptive Statisticsl We often want to look at a dependent variable
as a function of some independent variable(s)- tapply(experiment$RT,
experiment$Condition, mean)- “Split up the RTs by Condition, then get the mean”
l Try getting the mean RT for each iteml How about the median RT for each subject?l To combine multiple results into one table,
l R commands & functionsl Reading in datal Saving R scriptsl Descriptive statisticsl Subsetting datal Assigning new valuesl Referring to specific cellsl Types & type conversionl NA valuesl Getting help
Subsetting Data
l Often, we want to examine or use just part of a dataframe
l Remember how we read our dataframe?- experiment <- read.csv(...)
l Create a new dataframe that's just a subset of experiment:- experiment.LongRTsRemoved <-
subset(experiment, RT < 2000)Inclusion criterion: RT less than 2000 msOriginal dataframeNew dataframe name
Subsetting Data: Logical Operators
l Try getting just the observations with RTs 200 ms or more:- experiment.ShortRTsRemoved <-
subset(experiment, RT >= 200)l Why not just delete the bad RTs from the
spreadsheet?l Easy to make a mistake / miss some of theml Faster to have the computer do itl We’d lose the original datal No documentation of how we subsetted the data
Subsetting Data: AND and ORl What if we wanted only RTs between 200
and 2000 ms?- Could do two steps:- experiment.Temp <-
l Note that these subsets are just creating new dataframes in R
l If you want to save to a folder on your computer, use write.csv():l write.csv(experiment.BadRTsRemoved, file='experiment_badremoved.csv')
Logical Operators Review
l Summary- > Greater than- >= Greater than or equal to- < Less than- <= Less than or equal to- & AND- | OR- == Equal to- != Not equal to- %in% Is this included in a list?
R Basics
l R commands & functionsl Reading in datal Saving R scriptsl Descriptive statisticsl Subsetting datal Assigning new valuesl Referring to specific cellsl Types & type conversionl NA valuesl Getting help
Let’s Practice It!
l Try getting the mean RT for each number of TrialsRemaining (29 trials remaining, 28 trials remaining, etc.)
l Try getting a subset of just the people in TestingRoom 3
Let’s Practice It!
l Try getting the mean RT for each number of TrialsRemaining (29 trials remaining, 28 trials remaining, etc.)l tapply(experiment$RT, experiment$TrialsRemaining, mean)
l Try getting a subset of just the people in TestingRoom 3l experiment.Room3 <- subset(experiment, TestingRoom == 3)
R Basics
l R commands & functionsl Reading in datal Saving R scriptsl Descriptive statisticsl Subsetting datal Assigning new valuesl Referring to specific cellsl Types & type conversionl NA valuesl Getting help
Assignmentl Remember the pointing arrow used to
create dataframes and subsets?- e.g., experiment <- read.csv(...)
l This is the assignment operator. It saves results or values in a variable- x <- sqrt(64)- CriticalTrialsPerSubject <- 30
- Remember, typing a name by itself shows you the current value:
- CriticalTrialsPerSubjectl Assigning a new value overwrites the old
Assignment
l We can use this to create new columns in our dataframe:- experiment$ExperimentNumber <- 1- Here, the same number (1) is assigned to
every triall Or, compute a value for each row:- experiment$RTinSeconds <-
experiment$RT / 1000- For each trial, finds the RT in seconds for that
specific trial and saves that into RTinSeconds- Similar to an Excel formula
Assignment
l We can use this to create new columns in our dataframe:- experiment$ExperimentNumber <- 1- Here, the same number (1) is assigned to
every triall Another example:- experiment$SerialPosition <-
30 - experiment$TrialsRemaining- For each trial, calculates the serial position
(trial #1, trial #2, etc.) and saves the result into SerialPosition
ifelse()
IF YOU WANT DESSERT, EAT YOUR PEAS
… OR ELSE!
ifelse()l ifelse(): Use a test to decide which of two
values to assign:l experiment$Half <- ifelse(experiment$SerialPosition <= 15,1,2)
l Possible to nest ifelse() if we need more than 2 categories:
- Explains where the 15 comes from—helpful if we come back to this script later
- We can also refer to CriticalTrialsPerSubjectvariable later in the script & this ensure it’s consistent
- Easy to update if we change the number of critical trials
Deleting Variables
l It is also possible to delete a variable by assigning it the value of NULL- experiment$TrialsRemaining <- NULL
- Since we now have SerialPosition, maybe we don’t want to bother keeping TrialsRemainingany more
R Basics
l R commands & functionsl Reading in datal Saving R scriptsl Descriptive statisticsl Subsetting datal Assigning new valuesl Referring to specific cellsl Types & type conversionl NA valuesl Getting help
Referring to Specific Cells
l So far, we’ve seen how to- Create a new dataframe that’s a subset of an
existing dataframe- Modify a dataframe by creating an entire column
l What if we want to modify a dataframe by adjusting some existing values?l e.g., replace all RTs above 2000 ms with the
number 2000 (“fencing”)l Creating a new subset won’t work because we
want to change the original dataframel Need a way to edit specific values
Referring to Specific Cellsl Use square brackets [ ] to refer to specific
entries in a dataframe:- Row, column- experiment[3,7]
l Omit the row or column number to mean all rows or all columns:- experiment[3,]- experiment[,4]
l Can also use column names:l experiment[,'RT'] All rows in the RT column
l Remember c()? We can check multiple rows:- experiment[c(1:4),]
Row 3, all columnsAll rows in column 4
Logical Indexingl We can look at rows or columns that meet
a specific criterion...- experiment[experiment$RT < 200,]
l Can use this as another way to subset:- experiment.ShortRTsRemoved <-
experiment[experiment$RT > 200, ]- Actually, subset() just does this
l But we can also set values this way- experiment[experiment$RT < 200,
'RT'] <- 200- In the dataframe experiment, find the rows
where RT < 200, and set the column RT to 200
R Basics
l R commands & functionsl Reading in datal Saving R scriptsl Descriptive statisticsl Subsetting datal Assigning new valuesl Referring to specific cellsl Types & type conversionl NA valuesl Getting help
Types
l R treats continuous & categorical variables differently:
l These are different data types:- Numeric- Factor: Variable w/ fixed set of
categories (e.g., treatment vs. placebo)- Character: Freely entered text (e.g.,
open response question)
Types
l R's heuristic when reading in data:- Letters anywhere in the column → factor
- No letters, purely numbers → numeric
Type Conversion: Numeric → Factor
l Sometimes we need to correct this- Room 4 is not “twice as much” Room 2
l Create a new column that's the factor(categorical) version of TestingRoom:- experiment$Room.Factor <-
as.factor(experiment$TestingRoom)l Or, just overwrite the old column:- experiment$TestingRoom <-
as.factor(experiment$TestingRoom)
Conversion: Character → Factorl When ifelse() results in words, R creates
a character variable rather than a factor- Need to convert it
l To change a factor to a number, need to turn it into a character first:- experiment$Age.Numeric <-
as.numeric(as.character(experiment$Age))
R Basics
l R commands & functionsl Reading in datal Saving R scriptsl Descriptive statisticsl Subsetting datal Assigning new valuesl Referring to specific cellsl Types & type conversionl NA valuesl Getting help
NA
l We might have run into some problems trying to change Age into a numerical variable...
l NA means “not available”...- Characters that don't convert to numbers- Missing data in a spreadsheet- Invalid computations
NA
l If we try to do computations on a set of numbers where any of them is NA, we get NA as a result...- mean(experiment$Age.Numeric)
l R wants you to think about how you want to treat these missing values
NA – Solutions
l To ignore the NAs when doing a specificcomputation, use na.rm=TRUE:- sd(experiment$Age.Numeric,na.rm=TRUE)
l To get a copy of the dataframe that excludes all rows with an NA (in any column):- experiment.NoNAs <- na.omit(experiment)
l Change NAs to something else with logical indexing:- experiment[is.na(experiment$
Age.Numeric)==TRUE, ]$Age.Numeric <- 23
R Basics
l R commands & functionsl Reading in datal Saving R scriptsl Descriptive statisticsl Subsetting datal Assigning new valuesl Referring to specific cellsl Types & type conversionl NA valuesl Getting help
Getting Help
l Get help on a specific known function:- ?sqrt- ?write.csv- Lists all
argumentsl Try to find a
function on aparticular topic:- ??logarithm
Analyses & Add-On Packages
l Some built-in analyses:l aov() ANOVAl lm() Linear regressionl glm() Generalized linear models (e.g., logistic)l cor.test() Pearson correlationl t.test() t-test
l Help function (?) will tell you about the arguments to these particular functions
Wrap-Up
l Can use R for:- Reading in data- Descriptive statistics- Subsetting data- Creating new variables