By Danny Blaker data analyst at www.leanstartup.chat blogger www.dannyblaker.com Basic Operators + add - subtract * multiply / divide ^ power of %% Modulo & and Variables x <- assigns a variable to “x” x print a variable to console x + y add 2 variables together z <- x + y add 2 variables together and store them in a new variable (z). Classes 3 main classes: 12 Numeric (number) “hello” String TRUE Logical (True or False) Class(x) returns class of x unclass(x) returns argument without class Vectors Libraries & Packages HELP ?my_function type “?” before any function in the console to access documentation args(function) List arguments in a function typeof() Shows vector type length() Shows vector length range() Shows vector range print(x) prints x options() set global options x <- 23 x Example == equals != does not equal > greater than < less than >= greater than or equal to <= less than or equal to | or x <- c(23,54) y <- c(12,14) z <- x + y c() Example x <- c(1,2,3) creates a vector “x” containing numbers 1, 2, and 3 c() combines objects / elements vectors can be added, subtracted, multiplied and divided. Result can be stored in a new variable. names() sets the names for each element in a vector [] selects an element of a vector vector_1 > vector_2 checks if each element in vector_1 is greater than the corresponding element in vector_2 [] Example: x[c(3:6)] selects elements 3 to 6 of vector Data.frames Matrices install.package(”package_name”) installs any package you specify library(”package_name”) loads a package into worspace args(function) List arguments in a function search() search packages currently attached 3 main classes: matrix(1:9, byrow = TRUE, nrow = 3) creates a matrix containing no.s 1-9 accross 3 rows colnames() assigns column names rownames() assigns row names dimnames = list(c(”row name”), c(”column names”)) is another way to assign column and row names rbind() combines data by rows cbind() combines data by columns colSums() sum of matrix columns rowSums() sum of matrix rows x[1,1] selects row 1, column 1 reference from matix “x” entire matrices can be multiplied or divided like a regular vector RGH the data pirate’s R-cheatsheet names() Example names(x) <- y Factors Factors are weighted / ordered observations or variables factor(x) makes x a factor factor(x, ordered = FALSE) makes x a non-weighted factor factor(x, ordered = TRUE, levels = c("1st", "2nd", "3rd")) makes x into a factor with ordered levels: 1st, 2nd and 3rd levels(x) <- c("1", "2") assigns factor x with levels “1” and “2” str(x) structure of variable “x” or data set dim(x) Quickly shows number of observations & variables names() Shows top level names of list or dataset summary(x) Instant summary of “x” head(x) Shows start of dataset “x” tail(x) Shows end of dataset “x” str(head(x) Shows structure of start of dataset “x” str(tail(x) Shows structure of end of dataset “x” subset(x, subset = column_1 > 1) creates a subset of data frame “x” with all entries where “column_1” is greater than 1 order(x) sorts dataset x list() creates a list $ selects a column of a dataframe append(x,y) appends vectors x and y sort(x, decreasing = FALSE, ...) sorts x data.frame() creates a data frame items <- c(“parrot”,”sword”) islands <- c(”skull island”,”treasure caverns”) pirate_brochure <- data.frame(items, islands) creates a 2x2 data frame stored in “pirate_brochure” data.frame Example pirate_brochure[] selects elements of data frame “pirate_brochure” pirate_brochure[1, ] selects row 1 pirate_brochure[ ,1] selects column 1 pirate_brochure[1,1] selects observation row 1 col 1 pirate_brochure[1:2, “items”] selects 1st and second observations in column “items” data frame element selection Example df$names selects the “names” column of dataframe “df” dbConnect example dbConnect(RMySQL::MySQL(), dbname = "db", host = "db.amazonaws.com", port = 0000, user = "test", password = "1234") Basic Queries mean() average sum() sum abs() absolute value sd() standard deviation sqrt() square root norm() norm of matrix median() median value dnorm() density normal distribution pnorm() distribution fucntion for normal distribution qnorm() quantile normal distribution rnorm() random normal distribution strsplit() split strings identical() check if value is identical cat() combines and prints paste0() converts to strings and concatenates lm() fits linear model split() divide into groups and reassemble If statement if (condition1) { expr1 } else if (condition2) { expr2 } else if (condition3) { expr3 } else { expr4 } While loop while (condition) { expr } For loop for(var in seq) { expr } break exits loop next skips specified loop iteration Function syntax my_fun <- function(arg1, arg2) { body } Functions return(x) returns x is.na() counts how many elements are missing warning(..., call. = FALSE) warning message stop(..., call. = TRUE stops execution message() diagnostic message any() is atleast one value true lapply(X, FUN, ...) iterates over x with a function and returns a list sapply(X, FUN, ..., simplify = TRUE, USE.NAMES = TRUE) iterates over x with a function and returns a vector vapply(X, FUN, FUN.VALUE, ..., USE.NAMES = TRUE) iterates over x with a function and returns a specified output replicate(n, expr, simplify = "array") sapply for repeated evaluation mapply(FUN, ..., MoreArgs = NULL, SIMPLIFY = TRUE, USE.NAMES = TRUE) multivariate version of sapply library example library(ggplot2) qplot(pirates$swords, pirates$parrots) creates a plot with swords & parrots columns of “pirates” dataframe grep() returns a vector of indices of the character strings that contains a specificpattern. grepl() returns TRUE when a pattern is found in acorresponding character string Regular Expressions sub() search for and replace (first only) gsub() search for and replace (ALL) ^ match content at start of string $ match content at start of string .* match any character zero or more times \\ escapes a character (eg. “.”) Regex syntax example pirates <- c("[email protected]", "[email protected]") replace all “pirateparrot”s with “piratesword”s sub("@.*\\.com$","@pirateswo rd.com", pirates) Dates & Times Sys.Date() current date Sys.time() current time %Y 4-digit year (2016) %y 2-digit year (16) %m 2-digit month (01) %d 2-digit day of the month (20) %A weekday (Monday) %a: abbreviated weekday (Mon) %B: month (September) %b: abbreviated month (Sep) Graphs plot(x, y, ...) basic scatter plot hist() histogram boxplot() box plot density() kernel density plot dotchart() dot plot barplot() bar plot lines() line chart pie() pie chart list1[[x]] returns element x in list1 list1[[x]][[1]] returns the first element inside the element called x in list1 list1[[x]][[1]][[2]] returns the second element inside the first element inside the element called x in list1 List Subsetting map(.x, .f, ...) apply a function to each element of a vector pmap(.l, .f, ...) map over multiple inputs %>% pipes: “x %>% f(y)” is the same as “f(x, y)” purrr Importing Data filter(.data, ...) filter db rows by matching condition (requires dplyr package) Utils read.table() read.csv() read.delim() readr read_delim() read_csv() read_tsv() skip skips rows from beginning n_max maximum rows to import fread() fast import (requires dtplyr package) excel_sheets() prints sheet names read_excel() import data from spreadsheet read.xls() import from .xls (requires gdata package) loadWorkbook() import workbook (requires XLConnect package) getSheets() read sheets (XLConnect) readWorksheet() import sheets (XLConnect) read_sas(”file_name.sas7bdat) imports SAS file (requires haven package) DBI Base package RMySQL MySQL ROracle Oracle RPostgresSQL PostgresSQL Importing Data PACKAGES dbConnect connects to database