What is Reproducibility? The R* brouhaha (and how Research Objects can help) Professor Carole Goble The University of Manchester, UK Software Sustainability Institute, UK ELIXIR-UK, FAIRDOM Association e.V. [email protected]First International Workshop on Reproducible Open Science @ TPDL, 9 Sept 2016, Hannover, Germany
34
Embed
What is Reproducibility? The R* brouhaha (and how Research Objects can help)
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
What is Reproducibility?
The R* brouhaha(and how Research Objects can help)
Professor Carole GobleThe University of Manchester, UKSoftware Sustainability Institute, UKELIXIR-UK, FAIRDOM Association [email protected] International Workshop on Reproducible Open Science @ TPDL, 9 Sept 2016, Hannover, Germany
Acknowledgements• Dagstuhl Seminar 16041 , January 2016
– http://www.dagstuhl.de/en/program/calendar/semhp/?semnr=16041• ATI Symposium Reproducibility, Sustainability and Preservation , April
“When I use a word," Humpty Dumpty said in rather a scornful tone, "it means just what I choose it to mean - neither more nor less.”
Carroll, Through the Looking Glass
re-compute
replicatererun
repeat
re-examine
repurpose
recreate
reuse
restorereconstruct review
regeneraterevise
recycle
redo
robustness tolerance
verification compliance validation assurance
remix
Reproducibility of Reproducibility Research
Computational Science
http://tpeterka.github.io/maui-project/From: The Future of Scientific Workflows, Report of DOE Workshop 2015, http://science.energy.gov/~/media/ascr/pdf/programdocuments/docs/workflows_final_report.pd
1. Observational, experimental
2. Theoretical3. Simulation4. Data
intensive
BioSTIF
Computational Science
Scientific publications goals: (i) announce a result(ii) convince readers its correct.
Papers in experimental science should describe the results and provide a clear enough protocol to allow successful repetition and extension.
Papers in computational science should describe the results and provide the complete software development environment, data and set of instructions which generated the figures.
Virtual Witnessing*
*Leviathan and the Air-Pump: Hobbes, Boyle, and the Experimental Life (1985) Shapin and Schaffer.
Jill Mesirov
David Donoho
Datasets, Data collectionsStandard operating proceduresSoftware, algorithmsConfigurations, Tools and apps, services
Research: RACE1. For Every Result, Keep Track of How It
Was Produced2. Avoid Manual Data Manipulation Steps3. Archive the Exact Versions of All
External Programs Used4. Version Control All Custom Scripts5. Record All Intermediate Results, When
Possible in Standardized Formats6. For Analyses That Include Randomness,
Note Underlying Random Seeds7. Always Store Raw Data behind Plots8. Generate Hierarchical Analysis Output,
Allowing Layers of Increasing Detail to Be Inspected
9. Connect Textual Statements to Underlying Results
10.Provide Public Access to Scripts, Runs, and Results
Sandve GK, Nekrutenko A, Taylor J, Hovig E (2013) Ten Simple Rules for Reproducible Computational Research. PLoS Comput Biol 9(10): e1003285. doi:10.1371/journal.pcbi.1003285
Record Everything
Automate Everything
Contain Everything
ExposeEverything
Preparation painindependent testing trials and
tribulations
[Norman Morrison]
replication hostility no funding, time, recognition, place to publishresource intensive access to the complete environment
Lab Analogy: Witnessing “Datascopes”
Input Data
Software
Output Data
ConfigParameters
Methodstechniques, algorithms, spec. of the steps, models
the provision of enough detail about study procedures and data so the same procedures could, in theory or in actuality, be exactly repeated.
Result Reproducibility (aka replicability)
obtaining the same results from the conduct of an independent study whose procedures are as closely matched to the original experiment as possible
What does research reproducibility mean? Steven N. Goodman, Daniele Fanelli, John P. A. Ioannidis Science Translational Medicine 8 (341), 341ps12. [doi: 10.1126/scitranslmed.aaf5027] http://stm.sciencemag.org/content/scitransmed/8/341/341ps12.full.pdf
reviewers want additional workstatistician wants more runsanalysis needs to be repeatedpost-doc leaves, student arrivesnew/revised datasetsupdated/new versions of algorithms/codessample was contaminatedbetter kit - longer simulationsnew partners, new projects
Personal & Lab
Productivity
Public GoodReproducibili
ty
“Datascope” Lab Analogy
Methodstechniques, algorithms, spec. of the steps, models