R-Studio and Revolution Analytics have built additional functionality on top of base R.
Jan 02, 2016
Revolution Analytics has moved onto the radar screen for predictive analytics
http://www.forrester.com/pimages/rws/reprints/document/85601/oid/1-KWYFVB
Character Vector: b <- c("one","two","three")
numeric vector
character vector
Numeric Vector: a <- c(1,2,5.3,6,-2,4)
Matrix: y<-matrix(1:20, nrow=5,ncol=4)
Dataframe:d <- c(1,2,3,4)e <- c("red", "white", "red", NA)f <- c(TRUE,TRUE,TRUE,FALSE)mydata <- data.frame(d,e,f)names(mydata) <- c("ID","Color","Passed")
List:w <- list(name="Fred", age=5.3)
Data Structures
Framework Source: Hadley Wickham
Actor Heights
1) Create Vectors of Actor Names, Heights, Date of Birth, Gender
2) Combine the 4 Vectors into a DataFrame
• Numeric: e.g. heights
• String: e.g. names
• Dates: “12-03-2013
• Factor: e.g. gender
• Boolean: TRUE, FALSE
Variable Types
• We use the c() function and list all values in quotations so that R knows that it is string data.
• ?c Combine Values into a Vector or List
Creating a Character / String Vector
• Create a variable (aka object) called ActorNames:
ActorNames <- c(“John", “Meryl”, “Jennifer", “Andre")
Creating a Character / String Vector
• Create a variable called ActorHeights (inches):
ActorHeights <- c(77, 66, 70, 90)
Creating a Numeric Vector / Variable
• Use the as.Date() function:
ActorDoB <-as.Date(c("1930-10-27", "1949-06-22", "1990-08-15", "1946-05-19“ ))
• Each date has been entered as a text string (in quotations) in the appropriate format (yyyy-mm-dd).
• By enclosing these data in the as.Date() function, these strings are converted to date objects.
Creating a Date Variable
• Use the factor() function:
ActorGender <- c(“male", “female", “female", “male“ )
class(ActorGender)
ActorGender <- factor(ActorGender)
Creating a Categorical / Factor Variable
Actor.DF <- data.frame(Name=ActorNames, Height=ActorHeights, BirthDate = ActorDob, Gender=ActorGender)
Vectors and DataFrames
dim(Actor.DF)
1 2 3 4
Actor.DF[4,3] # row 1, column 3
Actor.DF[1,3] # row 4, column 3
Actor.DF[1,]
# row 1Actor.DF[2:3,]
# rows 2,3, all
columns
# column 2Actor.DF[,2]
Accessing Rows and Columns
• write.table(Actors.DF, “ActorData.txt", sep="\t", row.names = TRUE)
• write.csv(Actors.DF, “ActorData.csv")
Write / Create a File
Add New Variable: Height -> Feet, Inches
Actor.DF$Feet <- floor(Actor.DF$Height/12)Actor.DF$Inches <- Actor.DF$Height - (Actor.DF$Feet *12)
Logical Operators / Filter
Actor.DF$Height > 68Actor.DF$Gender == "female"
?'['Actor.DF[Actor.DF$Gender == "female",]
http://www.statmethods.net/management/operators.html