Reproducible research (and literate programming) in R

Post on 15-Apr-2017

391 Views

Category:

Data & Analytics

3 Downloads

Preview:

Click to see full reader

Transcript

Lenhard Group Retreat - October 2015

Reproducible research in R

Liz Ing-Simmons

Lenhard Group Retreat - October 2015

Reproducible research (and literate programming) in R

Liz Ing-Simmons

The worst kind of collaborator

(This is good motivation for reproducibility, too)

What is reproducibility?

What is reproducibility?

• Replicable:– results can be reproduced from an independent analysis

(different lab, model system, software…)

• Reproducible:– Results can be reproduced using your code and data

• Both are important!– Making analysis reproducible means being explicit about

what you’ve done, which makes it easier to replicate– and has other benefits (more on this later)

• Partial reproducibility is better than none

Or maybe the other way round depending on who you ask…

Reproducibility tools for R

• packrat– Manage and track dependencies for projects

• switchr– Switch between different package libraries

• knitr– Report generation from combined text and code

• R Markdown (rmarkdown package)– Simple formatting syntax for text and code blocks

You can use knitr with other languages too!

Literate programming

• Documents that combine code, results, and documentation that tells you what the code is doing

• Encourages you to be explicit about what you’re trying to do– can make it easier to spot mistakes– better code

more readable more understandable more reusable

• Bonus: make pretty reports for your collaborators• Some journals now encourage you to submit code as

supplementary material

Anatomy of an Rmarkdown document

YAML header: Title, author, document options

Code block:Enclosed in ```, language and

options specified

Text:Including section headers and links

A sample .Rmd

• In Rstudio, you can use the ‘knit HTML’ button (or pdf)

• In an R session, use knitr::knit2html()

Anatomy of an Rmd

Table of contents‘short’ or ‘long’ version – with code

included or without

Controls printing of warnings/messages

Custom figure / cache paths

Stop on error!

Anatomy of an RmdSecond-level header

Links

(you can use similar syntax to insert image files)

Load all packages(do not cache!)

Keep functions in one place

Anatomy of an RmdCache data loading /

processing

Code formatting within text using backticks`function()`

Control figure size for a specific chunk

It’s a good idea to name chunks – will be used to name figures

Anatomy of an Rmd

Code can be included for demonstration but not evaluated

Here the data is loaded from the package instead

Anatomy of an Rmd

Format tables with knitr::kable()

You can include citations from (e.g.) a BibTeX file in an Rmd!

(but it’s not worth it for two)

Include session info to track package versions used!

I also add the time the document was created

You can include evaluated R code in the text by using `r `

Other tips and tricks

• You can set multiple figure devices e.g. dev=c(‘pdf’, ‘png’)

• Disable lazy loading for very large caches (cache.lazy = FALSE)

• ‘dependson’ can be used to set dependencies between chunks

Other tips and tricks

• File paths:– Either relative to the Rmd location or set as a

variable• Consider directory structure

– (e.g. nicercode.github.io/blog/2013-04-05-projects/)

• Use set.seed() if using any random numbers

Resources

yihui.name/knitr Official site with examples and documentation(there’s also a knitr book)

kbroman.org/knitr_knutshell/Really good knitr tutorial

kbroman.org/steps2rr/Other reproducibility tips

rmarkdown.rstudio.com/Rmarkdown info including cheatsheets

Other reproducibility tools

• Jupyter (formerly iPython Notebook):– Similar in concept to knitr but for interactive

use (jupyter.org/)• Make (and similar tools):

– Automated building of project outputs• Docker (Rocker):

– Containers for code, like a lightweight virtual machine

top related