Top Banner
Hadley Wickham @hadleywickham Chief Scientist, RStudio The tidyverse September 2016
60

The tidyverse - Meetupfiles.meetup.com/1406240/Tidyverse.pdfThe tidyverse September 2016. Tidy Import Surprises, but doesn't scale Consistent way of Create new variables & new summaries

Apr 12, 2018

Download

Documents

donhi
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 2: The tidyverse - Meetupfiles.meetup.com/1406240/Tidyverse.pdfThe tidyverse September 2016. Tidy Import Surprises, but doesn't scale Consistent way of Create new variables & new summaries

Tidy

ImportSurprises, but doesn't scale

Create new variables & new summariesConsistent way of storing data

Visualise

Transform

Model

Communicate

Scales, but doesn't (fundamentally) surprise

Program

Page 3: The tidyverse - Meetupfiles.meetup.com/1406240/Tidyverse.pdfThe tidyverse September 2016. Tidy Import Surprises, but doesn't scale Consistent way of Create new variables & new summaries

No matter how complex and polished the individual operations are, it is often the quality of the glue that most directly determines the power of the system. — Hal Abelson

Page 4: The tidyverse - Meetupfiles.meetup.com/1406240/Tidyverse.pdfThe tidyverse September 2016. Tidy Import Surprises, but doesn't scale Consistent way of Create new variables & new summaries

Tidy

Import Visualise

Transform

Model

CommunicateProgram

Page 5: The tidyverse - Meetupfiles.meetup.com/1406240/Tidyverse.pdfThe tidyverse September 2016. Tidy Import Surprises, but doesn't scale Consistent way of Create new variables & new summaries

Tidy

Import Visualise

Transform

Model

CommunicateProgram

Page 6: The tidyverse - Meetupfiles.meetup.com/1406240/Tidyverse.pdfThe tidyverse September 2016. Tidy Import Surprises, but doesn't scale Consistent way of Create new variables & new summaries

The tidy tools manifesto

Page 7: The tidyverse - Meetupfiles.meetup.com/1406240/Tidyverse.pdfThe tidyverse September 2016. Tidy Import Surprises, but doesn't scale Consistent way of Create new variables & new summaries

Import

readr readxl haven

httr jsonlite

DBI rvest xml2

Tidy

tibble tidyr

Transform

dplyr forcats

hms lubridate

stringr

Visualise

ggplot2

Model

broom modelr

Program

purrr magrittr

http://r4ds.had.co.nz

tidyverse

Page 8: The tidyverse - Meetupfiles.meetup.com/1406240/Tidyverse.pdfThe tidyverse September 2016. Tidy Import Surprises, but doesn't scale Consistent way of Create new variables & new summaries

1. Share data structures. 2.Compose simple pieces. 3.Embrace FP. 4.Write for humans.

Page 9: The tidyverse - Meetupfiles.meetup.com/1406240/Tidyverse.pdfThe tidyverse September 2016. Tidy Import Surprises, but doesn't scale Consistent way of Create new variables & new summaries

1Share data structures

Page 10: The tidyverse - Meetupfiles.meetup.com/1406240/Tidyverse.pdfThe tidyverse September 2016. Tidy Import Surprises, but doesn't scale Consistent way of Create new variables & new summaries

1. Put each dataset in a data frame.

2. Put each variable in a column.

Tidy data

Page 11: The tidyverse - Meetupfiles.meetup.com/1406240/Tidyverse.pdfThe tidyverse September 2016. Tidy Import Surprises, but doesn't scale Consistent way of Create new variables & new summaries

# A tibble: 5,769 × 22 iso2 year m04 m514 m014 m1524 m2534 m3544 m4554 m5564 m65 mu f04 f514 f014 f1524 <chr> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> 1 AD 1989 NA NA NA NA NA NA NA NA NA NA NA NA NA NA 2 AD 1990 NA NA NA NA NA NA NA NA NA NA NA NA NA NA 3 AD 1991 NA NA NA NA NA NA NA NA NA NA NA NA NA NA 4 AD 1992 NA NA NA NA NA NA NA NA NA NA NA NA NA NA 5 AD 1993 NA NA NA NA NA NA NA NA NA NA NA NA NA NA 6 AD 1994 NA NA NA NA NA NA NA NA NA NA NA NA NA NA 7 AD 1996 NA NA 0 0 0 4 1 0 0 NA NA NA 0 1 8 AD 1997 NA NA 0 0 1 2 2 1 6 NA NA NA 0 1 9 AD 1998 NA NA 0 0 0 1 0 0 0 NA NA NA NA NA 10 AD 1999 NA NA 0 0 0 1 1 0 0 NA NA NA 0 0 11 AD 2000 NA NA 0 0 1 0 0 0 0 NA NA NA NA NA 12 AD 2001 NA NA 0 NA NA 2 1 NA NA NA NA NA NA NA 13 AD 2002 NA NA 0 0 0 1 0 0 0 NA NA NA 0 1 14 AD 2003 NA NA 0 0 0 1 2 0 0 NA NA NA 0 1 15 AD 2004 NA NA 0 0 0 1 1 0 0 NA NA NA 0 0 16 AD 2005 0 0 0 0 1 1 0 0 0 0 0 0 0 1 17 AD 2006 0 0 0 1 1 2 0 1 1 0 0 0 0 0 # ... with 5,752 more rows, and 6 more variables: f2534 <int>, f3544 <int>, f4554 <int>, # f5564 <int>, f65 <int>, fu <int>

Messy data has a varied “shape”

What are the variables in this dataset? (Hint: f = female, u = unknown, 1524 = 15-24)

Page 12: The tidyverse - Meetupfiles.meetup.com/1406240/Tidyverse.pdfThe tidyverse September 2016. Tidy Import Surprises, but doesn't scale Consistent way of Create new variables & new summaries

library(tidyr)

read_csv("tb.csv") %>% gather( m04:fu, key = demo, value = n, na.rm = TRUE ) %>% separate(demo, c("sex", "age"), 1) %>% arrange(iso2, year, sex, age) %>% rename(country = iso2)

tidyr helps you tidy your messy data

Page 13: The tidyverse - Meetupfiles.meetup.com/1406240/Tidyverse.pdfThe tidyverse September 2016. Tidy Import Surprises, but doesn't scale Consistent way of Create new variables & new summaries

# A tibble: 35,750 × 5 country year sex age n <chr> <int> <chr> <chr> <int> 1 AD 1996 f 014 0 2 AD 1996 f 1524 1 3 AD 1996 f 2534 1 4 AD 1996 f 3544 0 5 AD 1996 f 4554 0 6 AD 1996 f 5564 1 7 AD 1996 f 65 0 8 AD 1996 m 014 0 9 AD 1996 m 1524 0 10 AD 1996 m 2534 0 # ... with 35,740 more rows

Tidy data has a uniform “shape”

Page 14: The tidyverse - Meetupfiles.meetup.com/1406240/Tidyverse.pdfThe tidyverse September 2016. Tidy Import Surprises, but doesn't scale Consistent way of Create new variables & new summaries

Sometimes you don’t have variables & cases

stringsdates

matrices

vectors

xml

HTTP requests HTTP response

http://simplystatistics.org/2016/02/17/non-tidy-data/

factors

Page 15: The tidyverse - Meetupfiles.meetup.com/1406240/Tidyverse.pdfThe tidyverse September 2016. Tidy Import Surprises, but doesn't scale Consistent way of Create new variables & new summaries

What if you have a mix of object types?

Training data

Test data

Model

Predictions RMSE

Cross-validation

data frame

data frame

lm

vector scalar

Page 16: The tidyverse - Meetupfiles.meetup.com/1406240/Tidyverse.pdfThe tidyverse September 2016. Tidy Import Surprises, but doesn't scale Consistent way of Create new variables & new summaries

Use a tibble with list-columns!

# A tibble: 100 x 5 train test .id mod rmse <list> <list> <chr> <list> <dbl> 1 <S3: resample> <S3: resample> 001 <S3: lm> 0.5661605 2 <S3: resample> <S3: resample> 002 <S3: lm> 0.2399357 3 <S3: resample> <S3: resample> 003 <S3: lm> 3.5482986 4 <S3: resample> <S3: resample> 004 <S3: lm> 0.2396810 5 <S3: resample> <S3: resample> 005 <S3: lm> 0.1591336 6 <S3: resample> <S3: resample> 006 <S3: lm> 0.1934869 7 <S3: resample> <S3: resample> 007 <S3: lm> 0.2697834 8 <S3: resample> <S3: resample> 008 <S3: lm> 0.4910886 9 <S3: resample> <S3: resample> 009 <S3: lm> 1.7002645 10 <S3: resample> <S3: resample> 010 <S3: lm> 0.2047787 ... with 90 more rows

Page 17: The tidyverse - Meetupfiles.meetup.com/1406240/Tidyverse.pdfThe tidyverse September 2016. Tidy Import Surprises, but doesn't scale Consistent way of Create new variables & new summaries

df <- data.frame(xyz = "a") # What does this return? df$x

Your turn!

Page 18: The tidyverse - Meetupfiles.meetup.com/1406240/Tidyverse.pdfThe tidyverse September 2016. Tidy Import Surprises, but doesn't scale Consistent way of Create new variables & new summaries

df <- data.frame(xyz = "a") # What does this return? df$x #> [1] a #> Levels: a

Your turn!

Two surprisespartial name matching &

stringsAsFactors

Page 19: The tidyverse - Meetupfiles.meetup.com/1406240/Tidyverse.pdfThe tidyverse September 2016. Tidy Import Surprises, but doesn't scale Consistent way of Create new variables & new summaries

Two important tensions for understanding base R

Interactive exploration Programming

Conservative Utopian

Page 20: The tidyverse - Meetupfiles.meetup.com/1406240/Tidyverse.pdfThe tidyverse September 2016. Tidy Import Surprises, but doesn't scale Consistent way of Create new variables & new summaries

df <- tibble(xyz = "a")

df$xyz #> [1] "a"

is.data.frame(df[, "xyz"]) #> [1] TRUE

df$x #> Warning: Unknown column 'x' #> NULL

Tibbles are data frames that are lazy & surly

Page 21: The tidyverse - Meetupfiles.meetup.com/1406240/Tidyverse.pdfThe tidyverse September 2016. Tidy Import Surprises, but doesn't scale Consistent way of Create new variables & new summaries

data.frame(x = list(1:2, 3:5)) #> Error: arguments imply differing number #> of rows: 2, 3

And work better with list-columns

Page 22: The tidyverse - Meetupfiles.meetup.com/1406240/Tidyverse.pdfThe tidyverse September 2016. Tidy Import Surprises, but doesn't scale Consistent way of Create new variables & new summaries

data.frame(x = list(1:2, 3:5)) #> Error: arguments imply differing number #> of rows: 2, 3

data.frame(x = I(list(1:2, 3:5))) #> x #> 1 1, 2 #> 2 3, 4, 5

And work better with list-columns

Page 23: The tidyverse - Meetupfiles.meetup.com/1406240/Tidyverse.pdfThe tidyverse September 2016. Tidy Import Surprises, but doesn't scale Consistent way of Create new variables & new summaries

data.frame(x = list(1:2, 3:5)) #> Error: arguments imply differing number #> of rows: 2, 3

data.frame(x = I(list(1:2, 3:5))) #> x #> 1 1, 2 #> 2 3, 4, 5

tibble(x = list(1:2, 3:5)) #> # A tibble: 2 x 1 #> x #> <list> #> 1 <int [2]> #> 2 <int [3]>

And work better with list-columns

Page 24: The tidyverse - Meetupfiles.meetup.com/1406240/Tidyverse.pdfThe tidyverse September 2016. Tidy Import Surprises, but doesn't scale Consistent way of Create new variables & new summaries

2Compose simple pieces

Page 25: The tidyverse - Meetupfiles.meetup.com/1406240/Tidyverse.pdfThe tidyverse September 2016. Tidy Import Surprises, but doesn't scale Consistent way of Create new variables & new summaries

Goal: Solve complex problems by combining uniform pieces.

Page 26: The tidyverse - Meetupfiles.meetup.com/1406240/Tidyverse.pdfThe tidyverse September 2016. Tidy Import Surprises, but doesn't scale Consistent way of Create new variables & new summaries

https://www.flickr.com/photos/brunurb/13129057003

Page 27: The tidyverse - Meetupfiles.meetup.com/1406240/Tidyverse.pdfThe tidyverse September 2016. Tidy Import Surprises, but doesn't scale Consistent way of Create new variables & new summaries

http

://br

ickart

ist.co

m/ga

llery/

pc-m

agaz

ine-co

mput

er/. C

C-BY

-NC

Page 28: The tidyverse - Meetupfiles.meetup.com/1406240/Tidyverse.pdfThe tidyverse September 2016. Tidy Import Surprises, but doesn't scale Consistent way of Create new variables & new summaries

%>%magrittr::

Page 29: The tidyverse - Meetupfiles.meetup.com/1406240/Tidyverse.pdfThe tidyverse September 2016. Tidy Import Surprises, but doesn't scale Consistent way of Create new variables & new summaries

foo_foo <- little_bunny()

bop_on( scoop_up( hop_through(foo_foo, forest), field_mouse ), head )

# vs foo_foo %>% hop_through(forest) %>% scoop_up(field_mouse) %>% bop_on(head)

Page 30: The tidyverse - Meetupfiles.meetup.com/1406240/Tidyverse.pdfThe tidyverse September 2016. Tidy Import Surprises, but doesn't scale Consistent way of Create new variables & new summaries

library(nycflights13) library(dplyr) library(ggplot2)

flights %>% group_by(date) %>% summarise(n = n()) %>% ggplot(aes(date, n)) + geom_line()

Consistency across packages is important

😧

Page 31: The tidyverse - Meetupfiles.meetup.com/1406240/Tidyverse.pdfThe tidyverse September 2016. Tidy Import Surprises, but doesn't scale Consistent way of Create new variables & new summaries

ggplot(mtcars, aes(mpg, wt)) + geom_point() + geom_line() + ggsave("mtcars.pdf")

And ggplot2 is not even internally consistent x

Page 32: The tidyverse - Meetupfiles.meetup.com/1406240/Tidyverse.pdfThe tidyverse September 2016. Tidy Import Surprises, but doesn't scale Consistent way of Create new variables & new summaries

ggsave( "mtcars.pdf", ggplot(mtcars, aes(mpg, wt)) + geom_point() + geom_line() + )

And ggplot2 is not even internally consistent

😱

Page 33: The tidyverse - Meetupfiles.meetup.com/1406240/Tidyverse.pdfThe tidyverse September 2016. Tidy Import Surprises, but doesn't scale Consistent way of Create new variables & new summaries

# devtools::install_github("hadley/ggplot1") library(ggplot1)

ggsave( ggpoint( ggplot( mtcars, list(x = mpg, y = wt) ) ), "mtcars.pdf", width = 8, height = 6 )

ggplot1 had a tidier API than ggplot2!

Page 34: The tidyverse - Meetupfiles.meetup.com/1406240/Tidyverse.pdfThe tidyverse September 2016. Tidy Import Surprises, but doesn't scale Consistent way of Create new variables & new summaries

library(ggplot1)

mtcars %>% ggplot(list(x = mpg, y = wt)) %>% ggpoint() %>% ggsave("mtcars.pdf", width = 8, height = 6)

So you can use the pipe with ggplot1

Page 35: The tidyverse - Meetupfiles.meetup.com/1406240/Tidyverse.pdfThe tidyverse September 2016. Tidy Import Surprises, but doesn't scale Consistent way of Create new variables & new summaries

2

3

4

5

10 15 20 25 30 35

●●

●●

●●

●●

●●

● ●

wt

mpg

Page 36: The tidyverse - Meetupfiles.meetup.com/1406240/Tidyverse.pdfThe tidyverse September 2016. Tidy Import Surprises, but doesn't scale Consistent way of Create new variables & new summaries

library(rvest) library(purrr) library(readr) library(dplyr) library(lubridate)

read_html("https://www.massshootingtracker.org/data") %>% html_nodes("a[href^='https://docs.goo']") %>% html_attr("href") %>% map_df(read_csv) %>% mutate(date = mdy(date)) -> shootings

One small example from Bob Rudis https://rud.is/b/2016/07/26

Page 37: The tidyverse - Meetupfiles.meetup.com/1406240/Tidyverse.pdfThe tidyverse September 2016. Tidy Import Surprises, but doesn't scale Consistent way of Create new variables & new summaries

3Embrace FPAnswered with cupcakesWhy are for loops “bad”?

Page 38: The tidyverse - Meetupfiles.meetup.com/1406240/Tidyverse.pdfThe tidyverse September 2016. Tidy Import Surprises, but doesn't scale Consistent way of Create new variables & new summaries

1 cup flour a scant ¾ cup sugar 1 ½ t baking powder

3 T unsalted butter ½ cup whole milk

1 egg ¼ t pure vanilla extract

Preheat oven to 350°F. Put the flour, sugar, baking powder, salt, and butter in a freestanding electric mixer with a paddle attachment and beat on slow speed until you get a sandy consistency and everything is combined. Whisk the milk, egg, and vanilla together in a pitcher, then slowly pour about half into the flour mixture, beat to combine, and turn the mixer up to high speed to get rid of any lumps. Turn the mixer down to a slower speed and slowly pour in the remaining milk mixture. Continue mixing for a couple of more minutes until the batter is smooth but do not overmix. Spoon the batter into paper cases until 2/3 full and bake in the preheated oven for 20-25 minutes, or until the cake bounces back when touched.

Vanilla cupcakes The hummingbird bakery cookbook

Page 39: The tidyverse - Meetupfiles.meetup.com/1406240/Tidyverse.pdfThe tidyverse September 2016. Tidy Import Surprises, but doesn't scale Consistent way of Create new variables & new summaries

¾ cup + 2T flour 2 ½ T cocoa powder a scant ¾ cup sugar 1 ½ t baking powder

3 T unsalted butter ½ cup whole milk

1 egg ¼ t pure vanilla extract

Preheat oven to 350°F. Put the flour, cocoa, sugar, baking powder, salt, and butter in a freestanding electric mixer with a paddle attachment and beat on slow speed until you get a sandy consistency and everything is combined. Whisk the milk, egg, and vanilla together in a pitcher, then slowly pour about half into the flour mixture, beat to combine, and turn the mixer up to high speed to get rid of any lumps. Turn the mixer down to a slower speed and slowly pour in the remaining milk mixture. Continue mixing for a couple of more minutes until the batter is smooth but do not overmix. Spoon the batter into paper cases until 2/3 full and bake in the preheated oven for 20-25 minutes, or until the cake bounces back when touched.

Chocolate cupcakes The hummingbird bakery cookbook

Page 40: The tidyverse - Meetupfiles.meetup.com/1406240/Tidyverse.pdfThe tidyverse September 2016. Tidy Import Surprises, but doesn't scale Consistent way of Create new variables & new summaries

¾ cup + 2T flour 2 ½ T cocoa powder a scant ¾ cup sugar 1 ½ t baking powder

3 T unsalted butter ½ cup whole milk

1 egg ¼ t pure vanilla extract

Preheat oven to 350°F. Put the flour, cocoa, sugar, baking powder, salt, and butter in a freestanding electric mixer with a paddle attachment and beat on slow speed until you get a sandy consistency and everything is combined. Whisk the milk, egg, and vanilla together in a pitcher, then slowly pour about half into the flour mixture, beat to combine, and turn the mixer up to high speed to get rid of any lumps. Turn the mixer down to a slower speed and slowly pour in the remaining milk mixture. Continue mixing for a couple of more minutes until the batter is smooth but do not overmix. Spoon the batter into paper cases until 2/3 full and bake in the preheated oven for 20-25 minutes, or until the cake bounces back when touched.

Chocolate cupcakes The hummingbird bakery cookbook

Page 41: The tidyverse - Meetupfiles.meetup.com/1406240/Tidyverse.pdfThe tidyverse September 2016. Tidy Import Surprises, but doesn't scale Consistent way of Create new variables & new summaries

1 cup flour a scant ¾ cup sugar 1 ½ t baking powder

3 T unsalted butter ½ cup whole milk

1 egg ¼ t pure vanilla extract

Preheat oven to 350°F. Put the flour, sugar, baking powder, salt, and butter in a freestanding electric mixer with a paddle attachment and beat on slow speed until you get a sandy consistency and everything is combined. Whisk the milk, egg, and vanilla together in a pitcher, then slowly pour about half into the flour mixture, beat to combine, and turn the mixer up to high speed to get rid of any lumps. Turn the mixer down to a slower speed and slowly pour in the remaining milk mixture. Continue mixing for a couple of more minutes until the batter is smooth but do not overmix. Spoon the batter into paper cases until 2/3 full and bake in the preheated oven for 20-25 minutes, or until the cake bounces back when touched.

Vanilla cupcakes The hummingbird bakery cookbook

Page 42: The tidyverse - Meetupfiles.meetup.com/1406240/Tidyverse.pdfThe tidyverse September 2016. Tidy Import Surprises, but doesn't scale Consistent way of Create new variables & new summaries

120g flour 140g sugar

1.5 t baking powder 40g unsalted butter

120ml milk 1 egg

0.25 t pure vanilla extract

Preheat oven to 170°C. Put the flour, sugar, baking powder, salt, and butter in a freestanding electric mixer with a paddle attachment and beat on slow speed until you get a sandy consistency and everything is combined. Whisk the milk, egg, and vanilla together in a pitcher, then slowly pour about half into the flour mixture, beat to combine, and turn the mixer up to high speed to get rid of any lumps. Turn the mixer down to a slower speed and slowly pour in the remaining milk mixture. Continue mixing for a couple of more minutes until the batter is smooth but do not overmix. Spoon the batter into paper cases until 2/3 full and bake in the preheated oven for 20-25 minutes, or until the cake bounces back when touched.

Vanilla cupcakes

1. Convert units

The hummingbird bakery cookbook

Page 43: The tidyverse - Meetupfiles.meetup.com/1406240/Tidyverse.pdfThe tidyverse September 2016. Tidy Import Surprises, but doesn't scale Consistent way of Create new variables & new summaries

120g flour 140g sugar

1.5 t baking powder 40g butter 120ml milk

1 egg 0.25 t vanilla

Beat flour, sugar, baking powder, salt, and butter until sandy. Whisk milk, egg, and vanilla. Mix half into flour mixture until smooth (use high speed). Beat in remaining half. Mix until smooth. Bake 20-25 min at 170°C.

Vanilla cupcakes

2. Rely on domain knowledge

The hummingbird bakery cookbook

Page 44: The tidyverse - Meetupfiles.meetup.com/1406240/Tidyverse.pdfThe tidyverse September 2016. Tidy Import Surprises, but doesn't scale Consistent way of Create new variables & new summaries

Beat dry ingredients + butter until sandy. Whisk together wet ingredients. Mix half into dry until smooth (use high speed). Beat in remaining half. Mix until smooth. Bake 20-25 min at 170°C.

Vanilla cupcakes

3. Use variables

120g flour 140g sugar

1.5 t baking powder 40g butter 120ml milk

1 egg 0.25 t vanilla

The hummingbird bakery cookbook

Page 45: The tidyverse - Meetupfiles.meetup.com/1406240/Tidyverse.pdfThe tidyverse September 2016. Tidy Import Surprises, but doesn't scale Consistent way of Create new variables & new summaries

120g flour

140g sugar 1.5t baking powder 40g butter 120ml milk 1 egg 0.25 t vanilla

Beat dry ingredients + butter until sandy. Whisk together wet ingredients. Mix half into dry until smooth (use high speed). Beat in remaining half. Mix until smooth. Bake 20-25 min at 170°C.

Cupcakes

4. Extract out common code

100g flour 20g cocoa 140g sugar 1.5t baking powder 40g butter 120ml milk 1 egg 0.25 t vanilla

Vanilla Chocolate

Page 46: The tidyverse - Meetupfiles.meetup.com/1406240/Tidyverse.pdfThe tidyverse September 2016. Tidy Import Surprises, but doesn't scale Consistent way of Create new variables & new summaries

out1 <- vector("double", ncol(mtcars)) for(i in seq_along(mtcars)) { out1[[i]] <- mean(mtcars[[i]], na.rm = TRUE) }

out2 <- vector("double", ncol(mtcars)) for(i in seq_along(mtcars)) { out2[[i]] <- median(mtcars[[i]], na.rm = TRUE) }

What do these for loops do?

Page 47: The tidyverse - Meetupfiles.meetup.com/1406240/Tidyverse.pdfThe tidyverse September 2016. Tidy Import Surprises, but doesn't scale Consistent way of Create new variables & new summaries

out1 <- vector("double", ncol(mtcars)) for(i in seq_along(mtcars)) { out1[[i]] <- mean(mtcars[[i]], na.rm = TRUE) }

out2 <- vector("double", ncol(mtcars)) for(i in seq_along(mtcars)) { out2[[i]] <- median(mtcars[[i]], na.rm = TRUE) }

For loops emphasise the objects

Page 48: The tidyverse - Meetupfiles.meetup.com/1406240/Tidyverse.pdfThe tidyverse September 2016. Tidy Import Surprises, but doesn't scale Consistent way of Create new variables & new summaries

out1 <- vector("double", ncol(mtcars)) for(i in seq_along(mtcars)) { out1[[i]] <- mean(mtcars[[i]], na.rm = TRUE) }

out2 <- vector("double", ncol(mtcars)) for(i in seq_along(mtcars)) { out2[[i]] <- median(mtcars[[i]], na.rm = TRUE) }

Not the actions

Page 49: The tidyverse - Meetupfiles.meetup.com/1406240/Tidyverse.pdfThe tidyverse September 2016. Tidy Import Surprises, but doesn't scale Consistent way of Create new variables & new summaries

library(purrr)

means <- map_dbl(mtcars, mean) medians <- map_dbl(mtcars, median)

Functional programming emphasises the actions

Page 50: The tidyverse - Meetupfiles.meetup.com/1406240/Tidyverse.pdfThe tidyverse September 2016. Tidy Import Surprises, but doesn't scale Consistent way of Create new variables & new summaries

sim <- tribble( ~f, ~params, "runif", list(min = -1, max = 1), "rnorm", list(sd = 5), "rpois", list(lambda = 10) ) sim %>% mutate(sim = invoke_map(f, params, n = 10))

Teaser: simulation

Page 51: The tidyverse - Meetupfiles.meetup.com/1406240/Tidyverse.pdfThe tidyverse September 2016. Tidy Import Surprises, but doesn't scale Consistent way of Create new variables & new summaries

reports <- tibble( class = unique(mpg$class), filename = paste0("fuel-economy-", class, ".html"), params = map(class, ~ list(my_class = .)) )

reports %>% select(output_file = filename, params) %>% pwalk(rmarkdown::render, input = "fuel-economy.Rmd")

Teaser: saving parameterised reports

Page 52: The tidyverse - Meetupfiles.meetup.com/1406240/Tidyverse.pdfThe tidyverse September 2016. Tidy Import Surprises, but doesn't scale Consistent way of Create new variables & new summaries

4Write for humans

Page 53: The tidyverse - Meetupfiles.meetup.com/1406240/Tidyverse.pdfThe tidyverse September 2016. Tidy Import Surprises, but doesn't scale Consistent way of Create new variables & new summaries

Programs must be written for people to read, and only incidentally for machines to execute. — Hal Abelson

Page 54: The tidyverse - Meetupfiles.meetup.com/1406240/Tidyverse.pdfThe tidyverse September 2016. Tidy Import Surprises, but doesn't scale Consistent way of Create new variables & new summaries

tibble lubridate

forcats filter mutate

summarise arrange

select

magrittr

Page 55: The tidyverse - Meetupfiles.meetup.com/1406240/Tidyverse.pdfThe tidyverse September 2016. Tidy Import Surprises, but doesn't scale Consistent way of Create new variables & new summaries

Conclusion

Page 56: The tidyverse - Meetupfiles.meetup.com/1406240/Tidyverse.pdfThe tidyverse September 2016. Tidy Import Surprises, but doesn't scale Consistent way of Create new variables & new summaries

1. Share data structures. 2.Compose simple pieces. 3.Embrace FP. 4.Write for humans.

Page 57: The tidyverse - Meetupfiles.meetup.com/1406240/Tidyverse.pdfThe tidyverse September 2016. Tidy Import Surprises, but doesn't scale Consistent way of Create new variables & new summaries

My goal is to make a pit of success

http://blog.codinghorror.com/falling-into-the-pit-of-success/

Page 58: The tidyverse - Meetupfiles.meetup.com/1406240/Tidyverse.pdfThe tidyverse September 2016. Tidy Import Surprises, but doesn't scale Consistent way of Create new variables & new summaries

install.packages("tidyverse") library(tidyverse) #> Loading tidyverse: ggplot2 #> Loading tidyverse: tibble #> Loading tidyverse: tidyr #> Loading tidyverse: readr #> Loading tidyverse: purrr #> Loading tidyverse: dplyr #> Conflicts with tidy packages ---------------------------------------------- #> filter(): dplyr, stats #> lag(): dplyr, stats

Gotta install them all

Page 59: The tidyverse - Meetupfiles.meetup.com/1406240/Tidyverse.pdfThe tidyverse September 2016. Tidy Import Surprises, but doesn't scale Consistent way of Create new variables & new summaries

Import

readr readxl haven

httr jsonlite

DBI rvest xml2

Tidy

tibble tidyr

Transform

dplyr forcats

hms lubridate

stringr

Visualise

ggplot2

Model

broom modelr

???

Program

purrr magrittr

http://r4ds.had.co.nz

tidyverse

Page 60: The tidyverse - Meetupfiles.meetup.com/1406240/Tidyverse.pdfThe tidyverse September 2016. Tidy Import Surprises, but doesn't scale Consistent way of Create new variables & new summaries

This work is licensed under the Creative Commons Attribution-Noncommercial 3.0

United States License. To view a copy of this license, visit

http://creativecommons.org/licenses/by-nc/3.0/us/