Top Banner
Results Vary: The Pragmatics of Reproducibility and Research Object Frameworks Professor Carole Goble CBE FREng FBCS The University of Manchester, UK The Software Sustainability Institute [email protected] iConference, 26 March 2015, Newport Beach, Los Angeles, USA
83
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: I conference2015 goble-finalupload

Results Vary The Pragmatics of Reproducibility and Research Object FrameworksProfessor Carole Goble CBE FREng FBCS

The University of Manchester UK

The Software Sustainability Institute

carolegoblemanchesteracuk

iConference 26 March 2015 Newport Beach Los Angeles USA

What do I do CyberInfrastructure EcoSystems

e-Lab Collabs ampShared Asset Repositories

Knowledge Metadata Linked Data Ontologies

Software Engineering for Scientists

ComputationalWorkflow Systems

Scholarly Comms

Reproducibility

MicroPublications

Open Science

Research Objects

Linked Data forScience

Scientific EgoSystems

Biodiversity

Systems Biology

Synthetic Biology

Astronomy

HelioPhysics

Genomics

Health Epidemiology

Digital Preservation

Social Science

Pharmacology

Knowledge Turning Flow

Barriers to Cure

raquo Access to scientific resources

raquo Coordination and Collaboration

raquo Flow of Information

httpforatv20100423Sage_Commons_Josh_Sommer_Chordoma_Foundation

[Pettifer Attwood]

httpgetutopiacom

Virtual WitnessingScientific publications

raquo announce a result

raquo convince readers the result is correct

ldquopapers in experimental [and computational science] should describe the results and provide a clear enough protocol [algorithm] to allow successful repetition and extensionrdquo

Jill Mesirov Broad Institute 2010

Accessible Reproducible Research Science 22 January 2010 Vol 327 no 5964 pp 415-416 DOI 101126science1179653

Leviathan and the Air-Pump Hobbes Boyle and the Experimental Life (1985) Shapin and Schaffer

Bramhall et al QUALITY OF METHODS REPORTING IN ANIMAL MODELS OF COLITIS Inflammatory Bowel Diseases 2015

ldquoOnly one of the 58 papers reported all essential criteria on our checklist Animal age gender housing conditions and mortalitymorbidity were all poorly reportedhelliprdquo

httpwwwnaturecomnewsmale-researchers-stress-out-rodents-115106

ldquoAn article about computational science in a scientific publication is not the scholarship itself it is merely advertising of the scholarship The actual scholarship is the complete software development environment [the complete data] and the complete set of instructions which generated the figuresrdquo

David Donoho ldquoWavelab and Reproducible Researchrdquo 1995

Datasets Data collectionsStandard operating proceduresSoftware algorithmsConfigurations Tools and apps servicesCodes code librariesWorkflows scriptsSystem software Infrastructure Compilers hardware

Morin et al Shining Light into Black Boxes Science 2012 336(6078) 159-160 Ince et al The case for open computer programs Nature 482 2012

50 papers randomly chosen from 378

manuscripts in 2011 that use Burrows Wheeler Aligner for mapping Illumina reads

31 no sw version parameters exact

version of genomic reference sequence

26 no access to primary data sets

Nekrutenko amp Taylor Next-generation sequencing data interpretation enhancing reproducibility and accessibility Nature Genetics 13 (2012)

Broken software Broken science

raquo Geoffrey Chang Scripps Institute

raquo Homemade data-analysis program inherited from another lab

raquo Flipped two columns of data inverting the electron-density map used to derive protein structure

raquo Retract 3 Science papers and 2 papers in other journals

raquo One paper cited by 364 The structures of MsbA (purple) and Sav1866 (green) overlap little (left) until MsbA is inverted (right)

Miller A Scientists Nightmare Software Problem Leads to Five Retractions Science 22 December 2006 vol 314 no 5807 1856-1857httpwwwsoftwareacukblog2014-12-04-its-impossible-conduct-research-without-software-say-7-out-10-uk-researchers

Software making practices

ldquoAs a general rule

researchers do not

test or document their

programs rigorously

and they rarely

release their codes

making it almost

impossible to

reproduce and verify

published results

generated by

scientific softwarerdquo

2000 scientists JE Hannay et al ldquoHow Do Scientists Develop and Use Scientific Softwarerdquo Proc ICSE Workshop Software Eng for Computational Science and Eng 2009 pp 1ndash8

republic of science

regulation of science

institution cores libraries

Mertonrsquos four norms of scientific behaviour (1942)

public services

Tools StandardsMachine actionableFormats Reporting Policies Practices

Record and AutomateEverything

Potential Trace Heaven Folks

recomputationorg

sciencecodemanifestoorg

Honest Error Science is messy

Inherent

ReinhartRogoff Austerity economicsThomas Herndon

Nature Oct rsquo12

Zoeuml Corbyn

Fraud

ldquoI canrsquot immediately reproduce the research in my own laboratory It took an estimated 280 hours for an average user to approximately reproduce the paperrdquo

Prof Phil BourneAssociate Director NIH Big Data 2 Knowledge Program

When research goes ldquowrongrdquo

raquo Tainted resources

raquo Black boxes

raquo Poor Reporting

raquo Unavailable resources results data software

raquo Bad maths

raquo Sins of omission

raquo Poor training sloppiness

httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)

Ioannidis Why Most Published Research Findings Are False August 2005Joppa et alTroubling Trends in Scientific Software Use SCIENCE 340 May 2013

Scientific method

Social environmentraquo Impact factor mania

raquo Pressure to publish

raquo Broken peer review

raquo Research never reported

raquo Disorganisation

raquo Time pressures

raquo Prep amp curate costs

When research goes ldquowrongrdquo

httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)

Morrison

Do a Replication Study No thanks Not FAIR

Hard Resource intensiveUnrecognised TrolledJust gathering the bits together

Cross-Institutional e-Laboratory Fragmentation

Scattered parts Subject specific General resources

101 Innovations in Scholarly Communication - the Changing Research Workflow Boseman and Kramer 2015 httpfigsharecomarticles101_Innovations_in_Scholarly_Communication_the_Changing_Research_Workflow1286826

Process at ScaleMore on Models

httpsdoiorg1015490seek1investigation56

[Snoep 2015]

httpsdoiorg1015490seek1investigation56

Personal Data

Local Stores

External

Databases

Articles

Models

Standards

SOPs

Aggregated Commons Infrastructure

Consistent Comparative Reporting

Design protocols samples software modelshellip

httpwwwseek4scienceorg httpwwwfair-domorg httpisatoolsorg

Pop-Up Start UpsLittle Science within Big Science

How do Scientists Collaborate amp Cooperatively ExchangeCautiously Its all about The Trust

Extrinsic Driver

How do you get Scientists and Developers to work together Socially Its all about The Trust

Jam today Jam tomorrow Jam for all Just enough Jam Just in Time not Just in Case

Research Objects

Compound Interconnected Investigations Research Products

Multi-variousProductsPlatformsResources

Units of exchange commons contextual metadata

httpwwwresearchobjectorg

httpwwwresearchobjectorg

First class citizens - data software methods - id manage credit track profile focus

A Framework to Bundle and Link (scattered) resources related experiments Metadata Objects that carry Research Context

Research Objects

Bigger on the inside than the outside

Contentbull closed lt-gt open bull local lt-gt alienbull embed lt-gt referbull fixed lt-gt fluidbull nestedbull cite resolve steward

Contributionsbull multi ndashtyped stewarded

sited authoredbull span research researchers

platforms timebull cite resolve steward

Identity + Minimal Provenance

RO Resolution and Citation

rsaquo Defend it (snapshot)

rsaquo Locate it (most recent)

rsaquo Reuse it (a version a component)

rsaquo Credit it (contributory authorship)

rsaquo Cross link it (connections)

Biological Study Records (eg PRIDE) stableBiological Knowledge (eg UNIPROT) evolving

Goble De Roure Bechhofer Accelerating Knowledge Turns I3CK 2013

means

ends

driver

Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data Farr Research Object Commons

STELAR Asthma e-Lab Study Team for Early Life Asthma Research

Platform exchange ClinicalCodesorg coded patient cohorts exchange with NHS FARSITE system

STELAR e-Lab

Platform 1

Platform 2

Platform 3

A multi-site collaboration to support safe use of patient and research data for medical research

Research Object CurrencyCohort Studies

Focus on methods models workflows scripts software data figureshellip

Research Object Pivots and Profiles

Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation

R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482

Simply data + code

Can change the definition of

a figure and ultimately the

journal article

Colomb J and Brembs B

Sub-strains of Drosophila Canton-S differ

markedly in their locomotor behavior [v1

ref status indexed httpf1000res3is]

F1000Research 2014 3176

Other labs can replicate the study or

contribute their data to a meta-

analysis or disease model - figure

automatically updates

Data updates time-stamped

New conclusions added via versions

Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012

Software-like Release paradigm Not a static document paradigm

Reproduce looks backwards -gt Release looks forwards

raquo Science methods data change -gt agile evolution

raquo Comparisons versions forks amp merges dependencies

raquo Id amp Citations

raquo Interlinked ROs

[McEntyre]

Retrospective Release Research Object

The ROs Meme

recompute

replicatererun

repeat

re-examine

repurpose

recreate

reuse

restorereconstruct review

regeneraterevise

recycle

redo

What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo

ldquoshow A is true by doing Brdquo

verify but not falsify[Yong Nature 485 2012]

robustness tolerance

verification compliance

validation assurance

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

1 Science Changes So does the Lab

ldquoThe questions donrsquot

change but the

answers dordquoDan Reed

The lab is not fixedUpdated resources

UncertaintyBioSTIF

Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012

2 Instruments Break Labs Decaymaterials become unavailable technicians leave

Reproducibility Window

raquo Bit rot Black boxes

raquo Proprietary Licenses

raquo Clown services

raquo Partial replication

raquo Prepare to Repair

rsaquo form or function

rsaquo preserve or sustain

Jason Scott

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Methods(techniques algorithms

spec of the steps)

Materials(datasets parameters

algorithm seeds)

Experiment

Instruments(codes services scripts

underlying libraries)

Laboratory(sw and hw infrastructure

systems software

integrative platforms)

Setup

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Research Environment

submit articleand move onhellip

publish articlePublication Environment

Research Environment

publish articlePublication Environment

submit articleand move onhellip

[Adapted Freire 2013]

transparencydependencies

steps featuresprovenance trace

portability

robustness

preservation

accessavailable

descriptionintelligible

standardscommon APIs

licensing

standardscommon

metadata

change managementversioning

packaging

Machine actionable

Machine actionable

Reproducibility Framework

submit articleand move onhellip

Reporting

Documentation

Provenance ndashThick Trace Data

to Distilled Reporting

Distillation and

Summarisation

Alper P et al LabelFlow Exploiting Workflow Provenance to Surface Scientific Data Provenance IPAW 2014 84-96

Reproduce by ReadingArchived Record Retain the ProcessCode

The IT Crowd Series 3 Episode 4

The eLab Virtual Machine (or Docker Image)

a black box thoughdockercom

Reproduce by Running Active InstrumentRetain the bits

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

Shared Repository

Personal Notebook

Community Registry

Publishing Resource

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

standardsAdobe

UCFORE PROVODF

formats

api

Instrument

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Instrument

NISO-JATS

Instrument

J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013

Platform profiles

NISO-JATS

Instrument

Container

Manifest

OMEX archive

httpsresearchobjectgithubiospecificationsbundle

Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369

Retro-Fitted ROsusing off the shelf

platforms

Method Matters

Reproducibility Smarts

Commons not Repository

Research Tardis

Retro-fit ROs

Do As Little As Possible

Make -gt Born

Native RO platforms

RARE amp FAIR Knowledge Turns Means Research Objects

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researchers

Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

PI Team

RARE Research Reality Check

RARE Research Reality Check

Tribal Behaviour

raquo Gangs share but not with the public

raquo Tribal behaviours rsaquo Modellers share more than Experimentalists

rsaquo Experimentalists reuse models more than Modellers

raquo Trading behavioursrsaquo Collaboration ndash complementarity

correlations

raquo Structured consortia less likely to publicly share than individuals

raquo Post-hoc rationalised DataModel Cycles

[Garza 2014]

raquo Fluid transient collaborations gt ldquomy gangrdquo management

raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity

raquo Class captains (prefects)

raquo Get the cool kids on board

raquo Head teacher leadership

[Garza 2014]

Playground Rules

Trace Data

27032015 74

me

ME

my team

closecolleagues

peers

The Research Release Creep Spiral

raquo Data Hugging amp Flirting

raquo Reciprocity norms

raquo Hans W request

raquo Dowry phenomenon

raquo Private installations

raquo Private spaces on shared installations

raquo Safe havens

Too ugly to show anyone else

Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it

The code is too sophisticated for most readersrefereesI didnt work out all the details

I didnt actually write the code -- my student did

My competitors would be unfair to me

Its valuable intellectual property

It would make papers much longer

Referees would never agree to check the code

My code invokes other code with unpublished (proprietary) code

Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011

Drivers

love money

fame duty

fear timeeffort

shame duty

[Apologies to Resnick and Malone]

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Interface Framingraquo Limited scheduled sharing choices

rsaquo Never say never

raquo ldquoCitablerdquo not ldquoSharedrdquo

raquo Feedback

rsaquo Guilt tripping

rsaquo Outlier finger pointing

[Garzia]

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipResearch Currencies

ldquoResearchBitCoinrdquo

Citation Semantics

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

Page 2: I conference2015 goble-finalupload

What do I do CyberInfrastructure EcoSystems

e-Lab Collabs ampShared Asset Repositories

Knowledge Metadata Linked Data Ontologies

Software Engineering for Scientists

ComputationalWorkflow Systems

Scholarly Comms

Reproducibility

MicroPublications

Open Science

Research Objects

Linked Data forScience

Scientific EgoSystems

Biodiversity

Systems Biology

Synthetic Biology

Astronomy

HelioPhysics

Genomics

Health Epidemiology

Digital Preservation

Social Science

Pharmacology

Knowledge Turning Flow

Barriers to Cure

raquo Access to scientific resources

raquo Coordination and Collaboration

raquo Flow of Information

httpforatv20100423Sage_Commons_Josh_Sommer_Chordoma_Foundation

[Pettifer Attwood]

httpgetutopiacom

Virtual WitnessingScientific publications

raquo announce a result

raquo convince readers the result is correct

ldquopapers in experimental [and computational science] should describe the results and provide a clear enough protocol [algorithm] to allow successful repetition and extensionrdquo

Jill Mesirov Broad Institute 2010

Accessible Reproducible Research Science 22 January 2010 Vol 327 no 5964 pp 415-416 DOI 101126science1179653

Leviathan and the Air-Pump Hobbes Boyle and the Experimental Life (1985) Shapin and Schaffer

Bramhall et al QUALITY OF METHODS REPORTING IN ANIMAL MODELS OF COLITIS Inflammatory Bowel Diseases 2015

ldquoOnly one of the 58 papers reported all essential criteria on our checklist Animal age gender housing conditions and mortalitymorbidity were all poorly reportedhelliprdquo

httpwwwnaturecomnewsmale-researchers-stress-out-rodents-115106

ldquoAn article about computational science in a scientific publication is not the scholarship itself it is merely advertising of the scholarship The actual scholarship is the complete software development environment [the complete data] and the complete set of instructions which generated the figuresrdquo

David Donoho ldquoWavelab and Reproducible Researchrdquo 1995

Datasets Data collectionsStandard operating proceduresSoftware algorithmsConfigurations Tools and apps servicesCodes code librariesWorkflows scriptsSystem software Infrastructure Compilers hardware

Morin et al Shining Light into Black Boxes Science 2012 336(6078) 159-160 Ince et al The case for open computer programs Nature 482 2012

50 papers randomly chosen from 378

manuscripts in 2011 that use Burrows Wheeler Aligner for mapping Illumina reads

31 no sw version parameters exact

version of genomic reference sequence

26 no access to primary data sets

Nekrutenko amp Taylor Next-generation sequencing data interpretation enhancing reproducibility and accessibility Nature Genetics 13 (2012)

Broken software Broken science

raquo Geoffrey Chang Scripps Institute

raquo Homemade data-analysis program inherited from another lab

raquo Flipped two columns of data inverting the electron-density map used to derive protein structure

raquo Retract 3 Science papers and 2 papers in other journals

raquo One paper cited by 364 The structures of MsbA (purple) and Sav1866 (green) overlap little (left) until MsbA is inverted (right)

Miller A Scientists Nightmare Software Problem Leads to Five Retractions Science 22 December 2006 vol 314 no 5807 1856-1857httpwwwsoftwareacukblog2014-12-04-its-impossible-conduct-research-without-software-say-7-out-10-uk-researchers

Software making practices

ldquoAs a general rule

researchers do not

test or document their

programs rigorously

and they rarely

release their codes

making it almost

impossible to

reproduce and verify

published results

generated by

scientific softwarerdquo

2000 scientists JE Hannay et al ldquoHow Do Scientists Develop and Use Scientific Softwarerdquo Proc ICSE Workshop Software Eng for Computational Science and Eng 2009 pp 1ndash8

republic of science

regulation of science

institution cores libraries

Mertonrsquos four norms of scientific behaviour (1942)

public services

Tools StandardsMachine actionableFormats Reporting Policies Practices

Record and AutomateEverything

Potential Trace Heaven Folks

recomputationorg

sciencecodemanifestoorg

Honest Error Science is messy

Inherent

ReinhartRogoff Austerity economicsThomas Herndon

Nature Oct rsquo12

Zoeuml Corbyn

Fraud

ldquoI canrsquot immediately reproduce the research in my own laboratory It took an estimated 280 hours for an average user to approximately reproduce the paperrdquo

Prof Phil BourneAssociate Director NIH Big Data 2 Knowledge Program

When research goes ldquowrongrdquo

raquo Tainted resources

raquo Black boxes

raquo Poor Reporting

raquo Unavailable resources results data software

raquo Bad maths

raquo Sins of omission

raquo Poor training sloppiness

httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)

Ioannidis Why Most Published Research Findings Are False August 2005Joppa et alTroubling Trends in Scientific Software Use SCIENCE 340 May 2013

Scientific method

Social environmentraquo Impact factor mania

raquo Pressure to publish

raquo Broken peer review

raquo Research never reported

raquo Disorganisation

raquo Time pressures

raquo Prep amp curate costs

When research goes ldquowrongrdquo

httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)

Morrison

Do a Replication Study No thanks Not FAIR

Hard Resource intensiveUnrecognised TrolledJust gathering the bits together

Cross-Institutional e-Laboratory Fragmentation

Scattered parts Subject specific General resources

101 Innovations in Scholarly Communication - the Changing Research Workflow Boseman and Kramer 2015 httpfigsharecomarticles101_Innovations_in_Scholarly_Communication_the_Changing_Research_Workflow1286826

Process at ScaleMore on Models

httpsdoiorg1015490seek1investigation56

[Snoep 2015]

httpsdoiorg1015490seek1investigation56

Personal Data

Local Stores

External

Databases

Articles

Models

Standards

SOPs

Aggregated Commons Infrastructure

Consistent Comparative Reporting

Design protocols samples software modelshellip

httpwwwseek4scienceorg httpwwwfair-domorg httpisatoolsorg

Pop-Up Start UpsLittle Science within Big Science

How do Scientists Collaborate amp Cooperatively ExchangeCautiously Its all about The Trust

Extrinsic Driver

How do you get Scientists and Developers to work together Socially Its all about The Trust

Jam today Jam tomorrow Jam for all Just enough Jam Just in Time not Just in Case

Research Objects

Compound Interconnected Investigations Research Products

Multi-variousProductsPlatformsResources

Units of exchange commons contextual metadata

httpwwwresearchobjectorg

httpwwwresearchobjectorg

First class citizens - data software methods - id manage credit track profile focus

A Framework to Bundle and Link (scattered) resources related experiments Metadata Objects that carry Research Context

Research Objects

Bigger on the inside than the outside

Contentbull closed lt-gt open bull local lt-gt alienbull embed lt-gt referbull fixed lt-gt fluidbull nestedbull cite resolve steward

Contributionsbull multi ndashtyped stewarded

sited authoredbull span research researchers

platforms timebull cite resolve steward

Identity + Minimal Provenance

RO Resolution and Citation

rsaquo Defend it (snapshot)

rsaquo Locate it (most recent)

rsaquo Reuse it (a version a component)

rsaquo Credit it (contributory authorship)

rsaquo Cross link it (connections)

Biological Study Records (eg PRIDE) stableBiological Knowledge (eg UNIPROT) evolving

Goble De Roure Bechhofer Accelerating Knowledge Turns I3CK 2013

means

ends

driver

Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data Farr Research Object Commons

STELAR Asthma e-Lab Study Team for Early Life Asthma Research

Platform exchange ClinicalCodesorg coded patient cohorts exchange with NHS FARSITE system

STELAR e-Lab

Platform 1

Platform 2

Platform 3

A multi-site collaboration to support safe use of patient and research data for medical research

Research Object CurrencyCohort Studies

Focus on methods models workflows scripts software data figureshellip

Research Object Pivots and Profiles

Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation

R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482

Simply data + code

Can change the definition of

a figure and ultimately the

journal article

Colomb J and Brembs B

Sub-strains of Drosophila Canton-S differ

markedly in their locomotor behavior [v1

ref status indexed httpf1000res3is]

F1000Research 2014 3176

Other labs can replicate the study or

contribute their data to a meta-

analysis or disease model - figure

automatically updates

Data updates time-stamped

New conclusions added via versions

Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012

Software-like Release paradigm Not a static document paradigm

Reproduce looks backwards -gt Release looks forwards

raquo Science methods data change -gt agile evolution

raquo Comparisons versions forks amp merges dependencies

raquo Id amp Citations

raquo Interlinked ROs

[McEntyre]

Retrospective Release Research Object

The ROs Meme

recompute

replicatererun

repeat

re-examine

repurpose

recreate

reuse

restorereconstruct review

regeneraterevise

recycle

redo

What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo

ldquoshow A is true by doing Brdquo

verify but not falsify[Yong Nature 485 2012]

robustness tolerance

verification compliance

validation assurance

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

1 Science Changes So does the Lab

ldquoThe questions donrsquot

change but the

answers dordquoDan Reed

The lab is not fixedUpdated resources

UncertaintyBioSTIF

Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012

2 Instruments Break Labs Decaymaterials become unavailable technicians leave

Reproducibility Window

raquo Bit rot Black boxes

raquo Proprietary Licenses

raquo Clown services

raquo Partial replication

raquo Prepare to Repair

rsaquo form or function

rsaquo preserve or sustain

Jason Scott

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Methods(techniques algorithms

spec of the steps)

Materials(datasets parameters

algorithm seeds)

Experiment

Instruments(codes services scripts

underlying libraries)

Laboratory(sw and hw infrastructure

systems software

integrative platforms)

Setup

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Research Environment

submit articleand move onhellip

publish articlePublication Environment

Research Environment

publish articlePublication Environment

submit articleand move onhellip

[Adapted Freire 2013]

transparencydependencies

steps featuresprovenance trace

portability

robustness

preservation

accessavailable

descriptionintelligible

standardscommon APIs

licensing

standardscommon

metadata

change managementversioning

packaging

Machine actionable

Machine actionable

Reproducibility Framework

submit articleand move onhellip

Reporting

Documentation

Provenance ndashThick Trace Data

to Distilled Reporting

Distillation and

Summarisation

Alper P et al LabelFlow Exploiting Workflow Provenance to Surface Scientific Data Provenance IPAW 2014 84-96

Reproduce by ReadingArchived Record Retain the ProcessCode

The IT Crowd Series 3 Episode 4

The eLab Virtual Machine (or Docker Image)

a black box thoughdockercom

Reproduce by Running Active InstrumentRetain the bits

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

Shared Repository

Personal Notebook

Community Registry

Publishing Resource

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

standardsAdobe

UCFORE PROVODF

formats

api

Instrument

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Instrument

NISO-JATS

Instrument

J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013

Platform profiles

NISO-JATS

Instrument

Container

Manifest

OMEX archive

httpsresearchobjectgithubiospecificationsbundle

Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369

Retro-Fitted ROsusing off the shelf

platforms

Method Matters

Reproducibility Smarts

Commons not Repository

Research Tardis

Retro-fit ROs

Do As Little As Possible

Make -gt Born

Native RO platforms

RARE amp FAIR Knowledge Turns Means Research Objects

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researchers

Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

PI Team

RARE Research Reality Check

RARE Research Reality Check

Tribal Behaviour

raquo Gangs share but not with the public

raquo Tribal behaviours rsaquo Modellers share more than Experimentalists

rsaquo Experimentalists reuse models more than Modellers

raquo Trading behavioursrsaquo Collaboration ndash complementarity

correlations

raquo Structured consortia less likely to publicly share than individuals

raquo Post-hoc rationalised DataModel Cycles

[Garza 2014]

raquo Fluid transient collaborations gt ldquomy gangrdquo management

raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity

raquo Class captains (prefects)

raquo Get the cool kids on board

raquo Head teacher leadership

[Garza 2014]

Playground Rules

Trace Data

27032015 74

me

ME

my team

closecolleagues

peers

The Research Release Creep Spiral

raquo Data Hugging amp Flirting

raquo Reciprocity norms

raquo Hans W request

raquo Dowry phenomenon

raquo Private installations

raquo Private spaces on shared installations

raquo Safe havens

Too ugly to show anyone else

Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it

The code is too sophisticated for most readersrefereesI didnt work out all the details

I didnt actually write the code -- my student did

My competitors would be unfair to me

Its valuable intellectual property

It would make papers much longer

Referees would never agree to check the code

My code invokes other code with unpublished (proprietary) code

Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011

Drivers

love money

fame duty

fear timeeffort

shame duty

[Apologies to Resnick and Malone]

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Interface Framingraquo Limited scheduled sharing choices

rsaquo Never say never

raquo ldquoCitablerdquo not ldquoSharedrdquo

raquo Feedback

rsaquo Guilt tripping

rsaquo Outlier finger pointing

[Garzia]

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipResearch Currencies

ldquoResearchBitCoinrdquo

Citation Semantics

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

Page 3: I conference2015 goble-finalupload

Scientific EgoSystems

Biodiversity

Systems Biology

Synthetic Biology

Astronomy

HelioPhysics

Genomics

Health Epidemiology

Digital Preservation

Social Science

Pharmacology

Knowledge Turning Flow

Barriers to Cure

raquo Access to scientific resources

raquo Coordination and Collaboration

raquo Flow of Information

httpforatv20100423Sage_Commons_Josh_Sommer_Chordoma_Foundation

[Pettifer Attwood]

httpgetutopiacom

Virtual WitnessingScientific publications

raquo announce a result

raquo convince readers the result is correct

ldquopapers in experimental [and computational science] should describe the results and provide a clear enough protocol [algorithm] to allow successful repetition and extensionrdquo

Jill Mesirov Broad Institute 2010

Accessible Reproducible Research Science 22 January 2010 Vol 327 no 5964 pp 415-416 DOI 101126science1179653

Leviathan and the Air-Pump Hobbes Boyle and the Experimental Life (1985) Shapin and Schaffer

Bramhall et al QUALITY OF METHODS REPORTING IN ANIMAL MODELS OF COLITIS Inflammatory Bowel Diseases 2015

ldquoOnly one of the 58 papers reported all essential criteria on our checklist Animal age gender housing conditions and mortalitymorbidity were all poorly reportedhelliprdquo

httpwwwnaturecomnewsmale-researchers-stress-out-rodents-115106

ldquoAn article about computational science in a scientific publication is not the scholarship itself it is merely advertising of the scholarship The actual scholarship is the complete software development environment [the complete data] and the complete set of instructions which generated the figuresrdquo

David Donoho ldquoWavelab and Reproducible Researchrdquo 1995

Datasets Data collectionsStandard operating proceduresSoftware algorithmsConfigurations Tools and apps servicesCodes code librariesWorkflows scriptsSystem software Infrastructure Compilers hardware

Morin et al Shining Light into Black Boxes Science 2012 336(6078) 159-160 Ince et al The case for open computer programs Nature 482 2012

50 papers randomly chosen from 378

manuscripts in 2011 that use Burrows Wheeler Aligner for mapping Illumina reads

31 no sw version parameters exact

version of genomic reference sequence

26 no access to primary data sets

Nekrutenko amp Taylor Next-generation sequencing data interpretation enhancing reproducibility and accessibility Nature Genetics 13 (2012)

Broken software Broken science

raquo Geoffrey Chang Scripps Institute

raquo Homemade data-analysis program inherited from another lab

raquo Flipped two columns of data inverting the electron-density map used to derive protein structure

raquo Retract 3 Science papers and 2 papers in other journals

raquo One paper cited by 364 The structures of MsbA (purple) and Sav1866 (green) overlap little (left) until MsbA is inverted (right)

Miller A Scientists Nightmare Software Problem Leads to Five Retractions Science 22 December 2006 vol 314 no 5807 1856-1857httpwwwsoftwareacukblog2014-12-04-its-impossible-conduct-research-without-software-say-7-out-10-uk-researchers

Software making practices

ldquoAs a general rule

researchers do not

test or document their

programs rigorously

and they rarely

release their codes

making it almost

impossible to

reproduce and verify

published results

generated by

scientific softwarerdquo

2000 scientists JE Hannay et al ldquoHow Do Scientists Develop and Use Scientific Softwarerdquo Proc ICSE Workshop Software Eng for Computational Science and Eng 2009 pp 1ndash8

republic of science

regulation of science

institution cores libraries

Mertonrsquos four norms of scientific behaviour (1942)

public services

Tools StandardsMachine actionableFormats Reporting Policies Practices

Record and AutomateEverything

Potential Trace Heaven Folks

recomputationorg

sciencecodemanifestoorg

Honest Error Science is messy

Inherent

ReinhartRogoff Austerity economicsThomas Herndon

Nature Oct rsquo12

Zoeuml Corbyn

Fraud

ldquoI canrsquot immediately reproduce the research in my own laboratory It took an estimated 280 hours for an average user to approximately reproduce the paperrdquo

Prof Phil BourneAssociate Director NIH Big Data 2 Knowledge Program

When research goes ldquowrongrdquo

raquo Tainted resources

raquo Black boxes

raquo Poor Reporting

raquo Unavailable resources results data software

raquo Bad maths

raquo Sins of omission

raquo Poor training sloppiness

httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)

Ioannidis Why Most Published Research Findings Are False August 2005Joppa et alTroubling Trends in Scientific Software Use SCIENCE 340 May 2013

Scientific method

Social environmentraquo Impact factor mania

raquo Pressure to publish

raquo Broken peer review

raquo Research never reported

raquo Disorganisation

raquo Time pressures

raquo Prep amp curate costs

When research goes ldquowrongrdquo

httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)

Morrison

Do a Replication Study No thanks Not FAIR

Hard Resource intensiveUnrecognised TrolledJust gathering the bits together

Cross-Institutional e-Laboratory Fragmentation

Scattered parts Subject specific General resources

101 Innovations in Scholarly Communication - the Changing Research Workflow Boseman and Kramer 2015 httpfigsharecomarticles101_Innovations_in_Scholarly_Communication_the_Changing_Research_Workflow1286826

Process at ScaleMore on Models

httpsdoiorg1015490seek1investigation56

[Snoep 2015]

httpsdoiorg1015490seek1investigation56

Personal Data

Local Stores

External

Databases

Articles

Models

Standards

SOPs

Aggregated Commons Infrastructure

Consistent Comparative Reporting

Design protocols samples software modelshellip

httpwwwseek4scienceorg httpwwwfair-domorg httpisatoolsorg

Pop-Up Start UpsLittle Science within Big Science

How do Scientists Collaborate amp Cooperatively ExchangeCautiously Its all about The Trust

Extrinsic Driver

How do you get Scientists and Developers to work together Socially Its all about The Trust

Jam today Jam tomorrow Jam for all Just enough Jam Just in Time not Just in Case

Research Objects

Compound Interconnected Investigations Research Products

Multi-variousProductsPlatformsResources

Units of exchange commons contextual metadata

httpwwwresearchobjectorg

httpwwwresearchobjectorg

First class citizens - data software methods - id manage credit track profile focus

A Framework to Bundle and Link (scattered) resources related experiments Metadata Objects that carry Research Context

Research Objects

Bigger on the inside than the outside

Contentbull closed lt-gt open bull local lt-gt alienbull embed lt-gt referbull fixed lt-gt fluidbull nestedbull cite resolve steward

Contributionsbull multi ndashtyped stewarded

sited authoredbull span research researchers

platforms timebull cite resolve steward

Identity + Minimal Provenance

RO Resolution and Citation

rsaquo Defend it (snapshot)

rsaquo Locate it (most recent)

rsaquo Reuse it (a version a component)

rsaquo Credit it (contributory authorship)

rsaquo Cross link it (connections)

Biological Study Records (eg PRIDE) stableBiological Knowledge (eg UNIPROT) evolving

Goble De Roure Bechhofer Accelerating Knowledge Turns I3CK 2013

means

ends

driver

Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data Farr Research Object Commons

STELAR Asthma e-Lab Study Team for Early Life Asthma Research

Platform exchange ClinicalCodesorg coded patient cohorts exchange with NHS FARSITE system

STELAR e-Lab

Platform 1

Platform 2

Platform 3

A multi-site collaboration to support safe use of patient and research data for medical research

Research Object CurrencyCohort Studies

Focus on methods models workflows scripts software data figureshellip

Research Object Pivots and Profiles

Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation

R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482

Simply data + code

Can change the definition of

a figure and ultimately the

journal article

Colomb J and Brembs B

Sub-strains of Drosophila Canton-S differ

markedly in their locomotor behavior [v1

ref status indexed httpf1000res3is]

F1000Research 2014 3176

Other labs can replicate the study or

contribute their data to a meta-

analysis or disease model - figure

automatically updates

Data updates time-stamped

New conclusions added via versions

Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012

Software-like Release paradigm Not a static document paradigm

Reproduce looks backwards -gt Release looks forwards

raquo Science methods data change -gt agile evolution

raquo Comparisons versions forks amp merges dependencies

raquo Id amp Citations

raquo Interlinked ROs

[McEntyre]

Retrospective Release Research Object

The ROs Meme

recompute

replicatererun

repeat

re-examine

repurpose

recreate

reuse

restorereconstruct review

regeneraterevise

recycle

redo

What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo

ldquoshow A is true by doing Brdquo

verify but not falsify[Yong Nature 485 2012]

robustness tolerance

verification compliance

validation assurance

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

1 Science Changes So does the Lab

ldquoThe questions donrsquot

change but the

answers dordquoDan Reed

The lab is not fixedUpdated resources

UncertaintyBioSTIF

Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012

2 Instruments Break Labs Decaymaterials become unavailable technicians leave

Reproducibility Window

raquo Bit rot Black boxes

raquo Proprietary Licenses

raquo Clown services

raquo Partial replication

raquo Prepare to Repair

rsaquo form or function

rsaquo preserve or sustain

Jason Scott

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Methods(techniques algorithms

spec of the steps)

Materials(datasets parameters

algorithm seeds)

Experiment

Instruments(codes services scripts

underlying libraries)

Laboratory(sw and hw infrastructure

systems software

integrative platforms)

Setup

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Research Environment

submit articleand move onhellip

publish articlePublication Environment

Research Environment

publish articlePublication Environment

submit articleand move onhellip

[Adapted Freire 2013]

transparencydependencies

steps featuresprovenance trace

portability

robustness

preservation

accessavailable

descriptionintelligible

standardscommon APIs

licensing

standardscommon

metadata

change managementversioning

packaging

Machine actionable

Machine actionable

Reproducibility Framework

submit articleand move onhellip

Reporting

Documentation

Provenance ndashThick Trace Data

to Distilled Reporting

Distillation and

Summarisation

Alper P et al LabelFlow Exploiting Workflow Provenance to Surface Scientific Data Provenance IPAW 2014 84-96

Reproduce by ReadingArchived Record Retain the ProcessCode

The IT Crowd Series 3 Episode 4

The eLab Virtual Machine (or Docker Image)

a black box thoughdockercom

Reproduce by Running Active InstrumentRetain the bits

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

Shared Repository

Personal Notebook

Community Registry

Publishing Resource

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

standardsAdobe

UCFORE PROVODF

formats

api

Instrument

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Instrument

NISO-JATS

Instrument

J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013

Platform profiles

NISO-JATS

Instrument

Container

Manifest

OMEX archive

httpsresearchobjectgithubiospecificationsbundle

Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369

Retro-Fitted ROsusing off the shelf

platforms

Method Matters

Reproducibility Smarts

Commons not Repository

Research Tardis

Retro-fit ROs

Do As Little As Possible

Make -gt Born

Native RO platforms

RARE amp FAIR Knowledge Turns Means Research Objects

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researchers

Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

PI Team

RARE Research Reality Check

RARE Research Reality Check

Tribal Behaviour

raquo Gangs share but not with the public

raquo Tribal behaviours rsaquo Modellers share more than Experimentalists

rsaquo Experimentalists reuse models more than Modellers

raquo Trading behavioursrsaquo Collaboration ndash complementarity

correlations

raquo Structured consortia less likely to publicly share than individuals

raquo Post-hoc rationalised DataModel Cycles

[Garza 2014]

raquo Fluid transient collaborations gt ldquomy gangrdquo management

raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity

raquo Class captains (prefects)

raquo Get the cool kids on board

raquo Head teacher leadership

[Garza 2014]

Playground Rules

Trace Data

27032015 74

me

ME

my team

closecolleagues

peers

The Research Release Creep Spiral

raquo Data Hugging amp Flirting

raquo Reciprocity norms

raquo Hans W request

raquo Dowry phenomenon

raquo Private installations

raquo Private spaces on shared installations

raquo Safe havens

Too ugly to show anyone else

Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it

The code is too sophisticated for most readersrefereesI didnt work out all the details

I didnt actually write the code -- my student did

My competitors would be unfair to me

Its valuable intellectual property

It would make papers much longer

Referees would never agree to check the code

My code invokes other code with unpublished (proprietary) code

Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011

Drivers

love money

fame duty

fear timeeffort

shame duty

[Apologies to Resnick and Malone]

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Interface Framingraquo Limited scheduled sharing choices

rsaquo Never say never

raquo ldquoCitablerdquo not ldquoSharedrdquo

raquo Feedback

rsaquo Guilt tripping

rsaquo Outlier finger pointing

[Garzia]

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipResearch Currencies

ldquoResearchBitCoinrdquo

Citation Semantics

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

Page 4: I conference2015 goble-finalupload

Knowledge Turning Flow

Barriers to Cure

raquo Access to scientific resources

raquo Coordination and Collaboration

raquo Flow of Information

httpforatv20100423Sage_Commons_Josh_Sommer_Chordoma_Foundation

[Pettifer Attwood]

httpgetutopiacom

Virtual WitnessingScientific publications

raquo announce a result

raquo convince readers the result is correct

ldquopapers in experimental [and computational science] should describe the results and provide a clear enough protocol [algorithm] to allow successful repetition and extensionrdquo

Jill Mesirov Broad Institute 2010

Accessible Reproducible Research Science 22 January 2010 Vol 327 no 5964 pp 415-416 DOI 101126science1179653

Leviathan and the Air-Pump Hobbes Boyle and the Experimental Life (1985) Shapin and Schaffer

Bramhall et al QUALITY OF METHODS REPORTING IN ANIMAL MODELS OF COLITIS Inflammatory Bowel Diseases 2015

ldquoOnly one of the 58 papers reported all essential criteria on our checklist Animal age gender housing conditions and mortalitymorbidity were all poorly reportedhelliprdquo

httpwwwnaturecomnewsmale-researchers-stress-out-rodents-115106

ldquoAn article about computational science in a scientific publication is not the scholarship itself it is merely advertising of the scholarship The actual scholarship is the complete software development environment [the complete data] and the complete set of instructions which generated the figuresrdquo

David Donoho ldquoWavelab and Reproducible Researchrdquo 1995

Datasets Data collectionsStandard operating proceduresSoftware algorithmsConfigurations Tools and apps servicesCodes code librariesWorkflows scriptsSystem software Infrastructure Compilers hardware

Morin et al Shining Light into Black Boxes Science 2012 336(6078) 159-160 Ince et al The case for open computer programs Nature 482 2012

50 papers randomly chosen from 378

manuscripts in 2011 that use Burrows Wheeler Aligner for mapping Illumina reads

31 no sw version parameters exact

version of genomic reference sequence

26 no access to primary data sets

Nekrutenko amp Taylor Next-generation sequencing data interpretation enhancing reproducibility and accessibility Nature Genetics 13 (2012)

Broken software Broken science

raquo Geoffrey Chang Scripps Institute

raquo Homemade data-analysis program inherited from another lab

raquo Flipped two columns of data inverting the electron-density map used to derive protein structure

raquo Retract 3 Science papers and 2 papers in other journals

raquo One paper cited by 364 The structures of MsbA (purple) and Sav1866 (green) overlap little (left) until MsbA is inverted (right)

Miller A Scientists Nightmare Software Problem Leads to Five Retractions Science 22 December 2006 vol 314 no 5807 1856-1857httpwwwsoftwareacukblog2014-12-04-its-impossible-conduct-research-without-software-say-7-out-10-uk-researchers

Software making practices

ldquoAs a general rule

researchers do not

test or document their

programs rigorously

and they rarely

release their codes

making it almost

impossible to

reproduce and verify

published results

generated by

scientific softwarerdquo

2000 scientists JE Hannay et al ldquoHow Do Scientists Develop and Use Scientific Softwarerdquo Proc ICSE Workshop Software Eng for Computational Science and Eng 2009 pp 1ndash8

republic of science

regulation of science

institution cores libraries

Mertonrsquos four norms of scientific behaviour (1942)

public services

Tools StandardsMachine actionableFormats Reporting Policies Practices

Record and AutomateEverything

Potential Trace Heaven Folks

recomputationorg

sciencecodemanifestoorg

Honest Error Science is messy

Inherent

ReinhartRogoff Austerity economicsThomas Herndon

Nature Oct rsquo12

Zoeuml Corbyn

Fraud

ldquoI canrsquot immediately reproduce the research in my own laboratory It took an estimated 280 hours for an average user to approximately reproduce the paperrdquo

Prof Phil BourneAssociate Director NIH Big Data 2 Knowledge Program

When research goes ldquowrongrdquo

raquo Tainted resources

raquo Black boxes

raquo Poor Reporting

raquo Unavailable resources results data software

raquo Bad maths

raquo Sins of omission

raquo Poor training sloppiness

httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)

Ioannidis Why Most Published Research Findings Are False August 2005Joppa et alTroubling Trends in Scientific Software Use SCIENCE 340 May 2013

Scientific method

Social environmentraquo Impact factor mania

raquo Pressure to publish

raquo Broken peer review

raquo Research never reported

raquo Disorganisation

raquo Time pressures

raquo Prep amp curate costs

When research goes ldquowrongrdquo

httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)

Morrison

Do a Replication Study No thanks Not FAIR

Hard Resource intensiveUnrecognised TrolledJust gathering the bits together

Cross-Institutional e-Laboratory Fragmentation

Scattered parts Subject specific General resources

101 Innovations in Scholarly Communication - the Changing Research Workflow Boseman and Kramer 2015 httpfigsharecomarticles101_Innovations_in_Scholarly_Communication_the_Changing_Research_Workflow1286826

Process at ScaleMore on Models

httpsdoiorg1015490seek1investigation56

[Snoep 2015]

httpsdoiorg1015490seek1investigation56

Personal Data

Local Stores

External

Databases

Articles

Models

Standards

SOPs

Aggregated Commons Infrastructure

Consistent Comparative Reporting

Design protocols samples software modelshellip

httpwwwseek4scienceorg httpwwwfair-domorg httpisatoolsorg

Pop-Up Start UpsLittle Science within Big Science

How do Scientists Collaborate amp Cooperatively ExchangeCautiously Its all about The Trust

Extrinsic Driver

How do you get Scientists and Developers to work together Socially Its all about The Trust

Jam today Jam tomorrow Jam for all Just enough Jam Just in Time not Just in Case

Research Objects

Compound Interconnected Investigations Research Products

Multi-variousProductsPlatformsResources

Units of exchange commons contextual metadata

httpwwwresearchobjectorg

httpwwwresearchobjectorg

First class citizens - data software methods - id manage credit track profile focus

A Framework to Bundle and Link (scattered) resources related experiments Metadata Objects that carry Research Context

Research Objects

Bigger on the inside than the outside

Contentbull closed lt-gt open bull local lt-gt alienbull embed lt-gt referbull fixed lt-gt fluidbull nestedbull cite resolve steward

Contributionsbull multi ndashtyped stewarded

sited authoredbull span research researchers

platforms timebull cite resolve steward

Identity + Minimal Provenance

RO Resolution and Citation

rsaquo Defend it (snapshot)

rsaquo Locate it (most recent)

rsaquo Reuse it (a version a component)

rsaquo Credit it (contributory authorship)

rsaquo Cross link it (connections)

Biological Study Records (eg PRIDE) stableBiological Knowledge (eg UNIPROT) evolving

Goble De Roure Bechhofer Accelerating Knowledge Turns I3CK 2013

means

ends

driver

Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data Farr Research Object Commons

STELAR Asthma e-Lab Study Team for Early Life Asthma Research

Platform exchange ClinicalCodesorg coded patient cohorts exchange with NHS FARSITE system

STELAR e-Lab

Platform 1

Platform 2

Platform 3

A multi-site collaboration to support safe use of patient and research data for medical research

Research Object CurrencyCohort Studies

Focus on methods models workflows scripts software data figureshellip

Research Object Pivots and Profiles

Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation

R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482

Simply data + code

Can change the definition of

a figure and ultimately the

journal article

Colomb J and Brembs B

Sub-strains of Drosophila Canton-S differ

markedly in their locomotor behavior [v1

ref status indexed httpf1000res3is]

F1000Research 2014 3176

Other labs can replicate the study or

contribute their data to a meta-

analysis or disease model - figure

automatically updates

Data updates time-stamped

New conclusions added via versions

Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012

Software-like Release paradigm Not a static document paradigm

Reproduce looks backwards -gt Release looks forwards

raquo Science methods data change -gt agile evolution

raquo Comparisons versions forks amp merges dependencies

raquo Id amp Citations

raquo Interlinked ROs

[McEntyre]

Retrospective Release Research Object

The ROs Meme

recompute

replicatererun

repeat

re-examine

repurpose

recreate

reuse

restorereconstruct review

regeneraterevise

recycle

redo

What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo

ldquoshow A is true by doing Brdquo

verify but not falsify[Yong Nature 485 2012]

robustness tolerance

verification compliance

validation assurance

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

1 Science Changes So does the Lab

ldquoThe questions donrsquot

change but the

answers dordquoDan Reed

The lab is not fixedUpdated resources

UncertaintyBioSTIF

Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012

2 Instruments Break Labs Decaymaterials become unavailable technicians leave

Reproducibility Window

raquo Bit rot Black boxes

raquo Proprietary Licenses

raquo Clown services

raquo Partial replication

raquo Prepare to Repair

rsaquo form or function

rsaquo preserve or sustain

Jason Scott

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Methods(techniques algorithms

spec of the steps)

Materials(datasets parameters

algorithm seeds)

Experiment

Instruments(codes services scripts

underlying libraries)

Laboratory(sw and hw infrastructure

systems software

integrative platforms)

Setup

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Research Environment

submit articleand move onhellip

publish articlePublication Environment

Research Environment

publish articlePublication Environment

submit articleand move onhellip

[Adapted Freire 2013]

transparencydependencies

steps featuresprovenance trace

portability

robustness

preservation

accessavailable

descriptionintelligible

standardscommon APIs

licensing

standardscommon

metadata

change managementversioning

packaging

Machine actionable

Machine actionable

Reproducibility Framework

submit articleand move onhellip

Reporting

Documentation

Provenance ndashThick Trace Data

to Distilled Reporting

Distillation and

Summarisation

Alper P et al LabelFlow Exploiting Workflow Provenance to Surface Scientific Data Provenance IPAW 2014 84-96

Reproduce by ReadingArchived Record Retain the ProcessCode

The IT Crowd Series 3 Episode 4

The eLab Virtual Machine (or Docker Image)

a black box thoughdockercom

Reproduce by Running Active InstrumentRetain the bits

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

Shared Repository

Personal Notebook

Community Registry

Publishing Resource

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

standardsAdobe

UCFORE PROVODF

formats

api

Instrument

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Instrument

NISO-JATS

Instrument

J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013

Platform profiles

NISO-JATS

Instrument

Container

Manifest

OMEX archive

httpsresearchobjectgithubiospecificationsbundle

Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369

Retro-Fitted ROsusing off the shelf

platforms

Method Matters

Reproducibility Smarts

Commons not Repository

Research Tardis

Retro-fit ROs

Do As Little As Possible

Make -gt Born

Native RO platforms

RARE amp FAIR Knowledge Turns Means Research Objects

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researchers

Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

PI Team

RARE Research Reality Check

RARE Research Reality Check

Tribal Behaviour

raquo Gangs share but not with the public

raquo Tribal behaviours rsaquo Modellers share more than Experimentalists

rsaquo Experimentalists reuse models more than Modellers

raquo Trading behavioursrsaquo Collaboration ndash complementarity

correlations

raquo Structured consortia less likely to publicly share than individuals

raquo Post-hoc rationalised DataModel Cycles

[Garza 2014]

raquo Fluid transient collaborations gt ldquomy gangrdquo management

raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity

raquo Class captains (prefects)

raquo Get the cool kids on board

raquo Head teacher leadership

[Garza 2014]

Playground Rules

Trace Data

27032015 74

me

ME

my team

closecolleagues

peers

The Research Release Creep Spiral

raquo Data Hugging amp Flirting

raquo Reciprocity norms

raquo Hans W request

raquo Dowry phenomenon

raquo Private installations

raquo Private spaces on shared installations

raquo Safe havens

Too ugly to show anyone else

Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it

The code is too sophisticated for most readersrefereesI didnt work out all the details

I didnt actually write the code -- my student did

My competitors would be unfair to me

Its valuable intellectual property

It would make papers much longer

Referees would never agree to check the code

My code invokes other code with unpublished (proprietary) code

Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011

Drivers

love money

fame duty

fear timeeffort

shame duty

[Apologies to Resnick and Malone]

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Interface Framingraquo Limited scheduled sharing choices

rsaquo Never say never

raquo ldquoCitablerdquo not ldquoSharedrdquo

raquo Feedback

rsaquo Guilt tripping

rsaquo Outlier finger pointing

[Garzia]

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipResearch Currencies

ldquoResearchBitCoinrdquo

Citation Semantics

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

Page 5: I conference2015 goble-finalupload

[Pettifer Attwood]

httpgetutopiacom

Virtual WitnessingScientific publications

raquo announce a result

raquo convince readers the result is correct

ldquopapers in experimental [and computational science] should describe the results and provide a clear enough protocol [algorithm] to allow successful repetition and extensionrdquo

Jill Mesirov Broad Institute 2010

Accessible Reproducible Research Science 22 January 2010 Vol 327 no 5964 pp 415-416 DOI 101126science1179653

Leviathan and the Air-Pump Hobbes Boyle and the Experimental Life (1985) Shapin and Schaffer

Bramhall et al QUALITY OF METHODS REPORTING IN ANIMAL MODELS OF COLITIS Inflammatory Bowel Diseases 2015

ldquoOnly one of the 58 papers reported all essential criteria on our checklist Animal age gender housing conditions and mortalitymorbidity were all poorly reportedhelliprdquo

httpwwwnaturecomnewsmale-researchers-stress-out-rodents-115106

ldquoAn article about computational science in a scientific publication is not the scholarship itself it is merely advertising of the scholarship The actual scholarship is the complete software development environment [the complete data] and the complete set of instructions which generated the figuresrdquo

David Donoho ldquoWavelab and Reproducible Researchrdquo 1995

Datasets Data collectionsStandard operating proceduresSoftware algorithmsConfigurations Tools and apps servicesCodes code librariesWorkflows scriptsSystem software Infrastructure Compilers hardware

Morin et al Shining Light into Black Boxes Science 2012 336(6078) 159-160 Ince et al The case for open computer programs Nature 482 2012

50 papers randomly chosen from 378

manuscripts in 2011 that use Burrows Wheeler Aligner for mapping Illumina reads

31 no sw version parameters exact

version of genomic reference sequence

26 no access to primary data sets

Nekrutenko amp Taylor Next-generation sequencing data interpretation enhancing reproducibility and accessibility Nature Genetics 13 (2012)

Broken software Broken science

raquo Geoffrey Chang Scripps Institute

raquo Homemade data-analysis program inherited from another lab

raquo Flipped two columns of data inverting the electron-density map used to derive protein structure

raquo Retract 3 Science papers and 2 papers in other journals

raquo One paper cited by 364 The structures of MsbA (purple) and Sav1866 (green) overlap little (left) until MsbA is inverted (right)

Miller A Scientists Nightmare Software Problem Leads to Five Retractions Science 22 December 2006 vol 314 no 5807 1856-1857httpwwwsoftwareacukblog2014-12-04-its-impossible-conduct-research-without-software-say-7-out-10-uk-researchers

Software making practices

ldquoAs a general rule

researchers do not

test or document their

programs rigorously

and they rarely

release their codes

making it almost

impossible to

reproduce and verify

published results

generated by

scientific softwarerdquo

2000 scientists JE Hannay et al ldquoHow Do Scientists Develop and Use Scientific Softwarerdquo Proc ICSE Workshop Software Eng for Computational Science and Eng 2009 pp 1ndash8

republic of science

regulation of science

institution cores libraries

Mertonrsquos four norms of scientific behaviour (1942)

public services

Tools StandardsMachine actionableFormats Reporting Policies Practices

Record and AutomateEverything

Potential Trace Heaven Folks

recomputationorg

sciencecodemanifestoorg

Honest Error Science is messy

Inherent

ReinhartRogoff Austerity economicsThomas Herndon

Nature Oct rsquo12

Zoeuml Corbyn

Fraud

ldquoI canrsquot immediately reproduce the research in my own laboratory It took an estimated 280 hours for an average user to approximately reproduce the paperrdquo

Prof Phil BourneAssociate Director NIH Big Data 2 Knowledge Program

When research goes ldquowrongrdquo

raquo Tainted resources

raquo Black boxes

raquo Poor Reporting

raquo Unavailable resources results data software

raquo Bad maths

raquo Sins of omission

raquo Poor training sloppiness

httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)

Ioannidis Why Most Published Research Findings Are False August 2005Joppa et alTroubling Trends in Scientific Software Use SCIENCE 340 May 2013

Scientific method

Social environmentraquo Impact factor mania

raquo Pressure to publish

raquo Broken peer review

raquo Research never reported

raquo Disorganisation

raquo Time pressures

raquo Prep amp curate costs

When research goes ldquowrongrdquo

httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)

Morrison

Do a Replication Study No thanks Not FAIR

Hard Resource intensiveUnrecognised TrolledJust gathering the bits together

Cross-Institutional e-Laboratory Fragmentation

Scattered parts Subject specific General resources

101 Innovations in Scholarly Communication - the Changing Research Workflow Boseman and Kramer 2015 httpfigsharecomarticles101_Innovations_in_Scholarly_Communication_the_Changing_Research_Workflow1286826

Process at ScaleMore on Models

httpsdoiorg1015490seek1investigation56

[Snoep 2015]

httpsdoiorg1015490seek1investigation56

Personal Data

Local Stores

External

Databases

Articles

Models

Standards

SOPs

Aggregated Commons Infrastructure

Consistent Comparative Reporting

Design protocols samples software modelshellip

httpwwwseek4scienceorg httpwwwfair-domorg httpisatoolsorg

Pop-Up Start UpsLittle Science within Big Science

How do Scientists Collaborate amp Cooperatively ExchangeCautiously Its all about The Trust

Extrinsic Driver

How do you get Scientists and Developers to work together Socially Its all about The Trust

Jam today Jam tomorrow Jam for all Just enough Jam Just in Time not Just in Case

Research Objects

Compound Interconnected Investigations Research Products

Multi-variousProductsPlatformsResources

Units of exchange commons contextual metadata

httpwwwresearchobjectorg

httpwwwresearchobjectorg

First class citizens - data software methods - id manage credit track profile focus

A Framework to Bundle and Link (scattered) resources related experiments Metadata Objects that carry Research Context

Research Objects

Bigger on the inside than the outside

Contentbull closed lt-gt open bull local lt-gt alienbull embed lt-gt referbull fixed lt-gt fluidbull nestedbull cite resolve steward

Contributionsbull multi ndashtyped stewarded

sited authoredbull span research researchers

platforms timebull cite resolve steward

Identity + Minimal Provenance

RO Resolution and Citation

rsaquo Defend it (snapshot)

rsaquo Locate it (most recent)

rsaquo Reuse it (a version a component)

rsaquo Credit it (contributory authorship)

rsaquo Cross link it (connections)

Biological Study Records (eg PRIDE) stableBiological Knowledge (eg UNIPROT) evolving

Goble De Roure Bechhofer Accelerating Knowledge Turns I3CK 2013

means

ends

driver

Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data Farr Research Object Commons

STELAR Asthma e-Lab Study Team for Early Life Asthma Research

Platform exchange ClinicalCodesorg coded patient cohorts exchange with NHS FARSITE system

STELAR e-Lab

Platform 1

Platform 2

Platform 3

A multi-site collaboration to support safe use of patient and research data for medical research

Research Object CurrencyCohort Studies

Focus on methods models workflows scripts software data figureshellip

Research Object Pivots and Profiles

Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation

R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482

Simply data + code

Can change the definition of

a figure and ultimately the

journal article

Colomb J and Brembs B

Sub-strains of Drosophila Canton-S differ

markedly in their locomotor behavior [v1

ref status indexed httpf1000res3is]

F1000Research 2014 3176

Other labs can replicate the study or

contribute their data to a meta-

analysis or disease model - figure

automatically updates

Data updates time-stamped

New conclusions added via versions

Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012

Software-like Release paradigm Not a static document paradigm

Reproduce looks backwards -gt Release looks forwards

raquo Science methods data change -gt agile evolution

raquo Comparisons versions forks amp merges dependencies

raquo Id amp Citations

raquo Interlinked ROs

[McEntyre]

Retrospective Release Research Object

The ROs Meme

recompute

replicatererun

repeat

re-examine

repurpose

recreate

reuse

restorereconstruct review

regeneraterevise

recycle

redo

What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo

ldquoshow A is true by doing Brdquo

verify but not falsify[Yong Nature 485 2012]

robustness tolerance

verification compliance

validation assurance

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

1 Science Changes So does the Lab

ldquoThe questions donrsquot

change but the

answers dordquoDan Reed

The lab is not fixedUpdated resources

UncertaintyBioSTIF

Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012

2 Instruments Break Labs Decaymaterials become unavailable technicians leave

Reproducibility Window

raquo Bit rot Black boxes

raquo Proprietary Licenses

raquo Clown services

raquo Partial replication

raquo Prepare to Repair

rsaquo form or function

rsaquo preserve or sustain

Jason Scott

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Methods(techniques algorithms

spec of the steps)

Materials(datasets parameters

algorithm seeds)

Experiment

Instruments(codes services scripts

underlying libraries)

Laboratory(sw and hw infrastructure

systems software

integrative platforms)

Setup

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Research Environment

submit articleand move onhellip

publish articlePublication Environment

Research Environment

publish articlePublication Environment

submit articleand move onhellip

[Adapted Freire 2013]

transparencydependencies

steps featuresprovenance trace

portability

robustness

preservation

accessavailable

descriptionintelligible

standardscommon APIs

licensing

standardscommon

metadata

change managementversioning

packaging

Machine actionable

Machine actionable

Reproducibility Framework

submit articleand move onhellip

Reporting

Documentation

Provenance ndashThick Trace Data

to Distilled Reporting

Distillation and

Summarisation

Alper P et al LabelFlow Exploiting Workflow Provenance to Surface Scientific Data Provenance IPAW 2014 84-96

Reproduce by ReadingArchived Record Retain the ProcessCode

The IT Crowd Series 3 Episode 4

The eLab Virtual Machine (or Docker Image)

a black box thoughdockercom

Reproduce by Running Active InstrumentRetain the bits

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

Shared Repository

Personal Notebook

Community Registry

Publishing Resource

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

standardsAdobe

UCFORE PROVODF

formats

api

Instrument

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Instrument

NISO-JATS

Instrument

J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013

Platform profiles

NISO-JATS

Instrument

Container

Manifest

OMEX archive

httpsresearchobjectgithubiospecificationsbundle

Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369

Retro-Fitted ROsusing off the shelf

platforms

Method Matters

Reproducibility Smarts

Commons not Repository

Research Tardis

Retro-fit ROs

Do As Little As Possible

Make -gt Born

Native RO platforms

RARE amp FAIR Knowledge Turns Means Research Objects

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researchers

Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

PI Team

RARE Research Reality Check

RARE Research Reality Check

Tribal Behaviour

raquo Gangs share but not with the public

raquo Tribal behaviours rsaquo Modellers share more than Experimentalists

rsaquo Experimentalists reuse models more than Modellers

raquo Trading behavioursrsaquo Collaboration ndash complementarity

correlations

raquo Structured consortia less likely to publicly share than individuals

raquo Post-hoc rationalised DataModel Cycles

[Garza 2014]

raquo Fluid transient collaborations gt ldquomy gangrdquo management

raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity

raquo Class captains (prefects)

raquo Get the cool kids on board

raquo Head teacher leadership

[Garza 2014]

Playground Rules

Trace Data

27032015 74

me

ME

my team

closecolleagues

peers

The Research Release Creep Spiral

raquo Data Hugging amp Flirting

raquo Reciprocity norms

raquo Hans W request

raquo Dowry phenomenon

raquo Private installations

raquo Private spaces on shared installations

raquo Safe havens

Too ugly to show anyone else

Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it

The code is too sophisticated for most readersrefereesI didnt work out all the details

I didnt actually write the code -- my student did

My competitors would be unfair to me

Its valuable intellectual property

It would make papers much longer

Referees would never agree to check the code

My code invokes other code with unpublished (proprietary) code

Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011

Drivers

love money

fame duty

fear timeeffort

shame duty

[Apologies to Resnick and Malone]

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Interface Framingraquo Limited scheduled sharing choices

rsaquo Never say never

raquo ldquoCitablerdquo not ldquoSharedrdquo

raquo Feedback

rsaquo Guilt tripping

rsaquo Outlier finger pointing

[Garzia]

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipResearch Currencies

ldquoResearchBitCoinrdquo

Citation Semantics

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

Page 6: I conference2015 goble-finalupload

Virtual WitnessingScientific publications

raquo announce a result

raquo convince readers the result is correct

ldquopapers in experimental [and computational science] should describe the results and provide a clear enough protocol [algorithm] to allow successful repetition and extensionrdquo

Jill Mesirov Broad Institute 2010

Accessible Reproducible Research Science 22 January 2010 Vol 327 no 5964 pp 415-416 DOI 101126science1179653

Leviathan and the Air-Pump Hobbes Boyle and the Experimental Life (1985) Shapin and Schaffer

Bramhall et al QUALITY OF METHODS REPORTING IN ANIMAL MODELS OF COLITIS Inflammatory Bowel Diseases 2015

ldquoOnly one of the 58 papers reported all essential criteria on our checklist Animal age gender housing conditions and mortalitymorbidity were all poorly reportedhelliprdquo

httpwwwnaturecomnewsmale-researchers-stress-out-rodents-115106

ldquoAn article about computational science in a scientific publication is not the scholarship itself it is merely advertising of the scholarship The actual scholarship is the complete software development environment [the complete data] and the complete set of instructions which generated the figuresrdquo

David Donoho ldquoWavelab and Reproducible Researchrdquo 1995

Datasets Data collectionsStandard operating proceduresSoftware algorithmsConfigurations Tools and apps servicesCodes code librariesWorkflows scriptsSystem software Infrastructure Compilers hardware

Morin et al Shining Light into Black Boxes Science 2012 336(6078) 159-160 Ince et al The case for open computer programs Nature 482 2012

50 papers randomly chosen from 378

manuscripts in 2011 that use Burrows Wheeler Aligner for mapping Illumina reads

31 no sw version parameters exact

version of genomic reference sequence

26 no access to primary data sets

Nekrutenko amp Taylor Next-generation sequencing data interpretation enhancing reproducibility and accessibility Nature Genetics 13 (2012)

Broken software Broken science

raquo Geoffrey Chang Scripps Institute

raquo Homemade data-analysis program inherited from another lab

raquo Flipped two columns of data inverting the electron-density map used to derive protein structure

raquo Retract 3 Science papers and 2 papers in other journals

raquo One paper cited by 364 The structures of MsbA (purple) and Sav1866 (green) overlap little (left) until MsbA is inverted (right)

Miller A Scientists Nightmare Software Problem Leads to Five Retractions Science 22 December 2006 vol 314 no 5807 1856-1857httpwwwsoftwareacukblog2014-12-04-its-impossible-conduct-research-without-software-say-7-out-10-uk-researchers

Software making practices

ldquoAs a general rule

researchers do not

test or document their

programs rigorously

and they rarely

release their codes

making it almost

impossible to

reproduce and verify

published results

generated by

scientific softwarerdquo

2000 scientists JE Hannay et al ldquoHow Do Scientists Develop and Use Scientific Softwarerdquo Proc ICSE Workshop Software Eng for Computational Science and Eng 2009 pp 1ndash8

republic of science

regulation of science

institution cores libraries

Mertonrsquos four norms of scientific behaviour (1942)

public services

Tools StandardsMachine actionableFormats Reporting Policies Practices

Record and AutomateEverything

Potential Trace Heaven Folks

recomputationorg

sciencecodemanifestoorg

Honest Error Science is messy

Inherent

ReinhartRogoff Austerity economicsThomas Herndon

Nature Oct rsquo12

Zoeuml Corbyn

Fraud

ldquoI canrsquot immediately reproduce the research in my own laboratory It took an estimated 280 hours for an average user to approximately reproduce the paperrdquo

Prof Phil BourneAssociate Director NIH Big Data 2 Knowledge Program

When research goes ldquowrongrdquo

raquo Tainted resources

raquo Black boxes

raquo Poor Reporting

raquo Unavailable resources results data software

raquo Bad maths

raquo Sins of omission

raquo Poor training sloppiness

httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)

Ioannidis Why Most Published Research Findings Are False August 2005Joppa et alTroubling Trends in Scientific Software Use SCIENCE 340 May 2013

Scientific method

Social environmentraquo Impact factor mania

raquo Pressure to publish

raquo Broken peer review

raquo Research never reported

raquo Disorganisation

raquo Time pressures

raquo Prep amp curate costs

When research goes ldquowrongrdquo

httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)

Morrison

Do a Replication Study No thanks Not FAIR

Hard Resource intensiveUnrecognised TrolledJust gathering the bits together

Cross-Institutional e-Laboratory Fragmentation

Scattered parts Subject specific General resources

101 Innovations in Scholarly Communication - the Changing Research Workflow Boseman and Kramer 2015 httpfigsharecomarticles101_Innovations_in_Scholarly_Communication_the_Changing_Research_Workflow1286826

Process at ScaleMore on Models

httpsdoiorg1015490seek1investigation56

[Snoep 2015]

httpsdoiorg1015490seek1investigation56

Personal Data

Local Stores

External

Databases

Articles

Models

Standards

SOPs

Aggregated Commons Infrastructure

Consistent Comparative Reporting

Design protocols samples software modelshellip

httpwwwseek4scienceorg httpwwwfair-domorg httpisatoolsorg

Pop-Up Start UpsLittle Science within Big Science

How do Scientists Collaborate amp Cooperatively ExchangeCautiously Its all about The Trust

Extrinsic Driver

How do you get Scientists and Developers to work together Socially Its all about The Trust

Jam today Jam tomorrow Jam for all Just enough Jam Just in Time not Just in Case

Research Objects

Compound Interconnected Investigations Research Products

Multi-variousProductsPlatformsResources

Units of exchange commons contextual metadata

httpwwwresearchobjectorg

httpwwwresearchobjectorg

First class citizens - data software methods - id manage credit track profile focus

A Framework to Bundle and Link (scattered) resources related experiments Metadata Objects that carry Research Context

Research Objects

Bigger on the inside than the outside

Contentbull closed lt-gt open bull local lt-gt alienbull embed lt-gt referbull fixed lt-gt fluidbull nestedbull cite resolve steward

Contributionsbull multi ndashtyped stewarded

sited authoredbull span research researchers

platforms timebull cite resolve steward

Identity + Minimal Provenance

RO Resolution and Citation

rsaquo Defend it (snapshot)

rsaquo Locate it (most recent)

rsaquo Reuse it (a version a component)

rsaquo Credit it (contributory authorship)

rsaquo Cross link it (connections)

Biological Study Records (eg PRIDE) stableBiological Knowledge (eg UNIPROT) evolving

Goble De Roure Bechhofer Accelerating Knowledge Turns I3CK 2013

means

ends

driver

Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data Farr Research Object Commons

STELAR Asthma e-Lab Study Team for Early Life Asthma Research

Platform exchange ClinicalCodesorg coded patient cohorts exchange with NHS FARSITE system

STELAR e-Lab

Platform 1

Platform 2

Platform 3

A multi-site collaboration to support safe use of patient and research data for medical research

Research Object CurrencyCohort Studies

Focus on methods models workflows scripts software data figureshellip

Research Object Pivots and Profiles

Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation

R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482

Simply data + code

Can change the definition of

a figure and ultimately the

journal article

Colomb J and Brembs B

Sub-strains of Drosophila Canton-S differ

markedly in their locomotor behavior [v1

ref status indexed httpf1000res3is]

F1000Research 2014 3176

Other labs can replicate the study or

contribute their data to a meta-

analysis or disease model - figure

automatically updates

Data updates time-stamped

New conclusions added via versions

Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012

Software-like Release paradigm Not a static document paradigm

Reproduce looks backwards -gt Release looks forwards

raquo Science methods data change -gt agile evolution

raquo Comparisons versions forks amp merges dependencies

raquo Id amp Citations

raquo Interlinked ROs

[McEntyre]

Retrospective Release Research Object

The ROs Meme

recompute

replicatererun

repeat

re-examine

repurpose

recreate

reuse

restorereconstruct review

regeneraterevise

recycle

redo

What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo

ldquoshow A is true by doing Brdquo

verify but not falsify[Yong Nature 485 2012]

robustness tolerance

verification compliance

validation assurance

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

1 Science Changes So does the Lab

ldquoThe questions donrsquot

change but the

answers dordquoDan Reed

The lab is not fixedUpdated resources

UncertaintyBioSTIF

Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012

2 Instruments Break Labs Decaymaterials become unavailable technicians leave

Reproducibility Window

raquo Bit rot Black boxes

raquo Proprietary Licenses

raquo Clown services

raquo Partial replication

raquo Prepare to Repair

rsaquo form or function

rsaquo preserve or sustain

Jason Scott

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Methods(techniques algorithms

spec of the steps)

Materials(datasets parameters

algorithm seeds)

Experiment

Instruments(codes services scripts

underlying libraries)

Laboratory(sw and hw infrastructure

systems software

integrative platforms)

Setup

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Research Environment

submit articleand move onhellip

publish articlePublication Environment

Research Environment

publish articlePublication Environment

submit articleand move onhellip

[Adapted Freire 2013]

transparencydependencies

steps featuresprovenance trace

portability

robustness

preservation

accessavailable

descriptionintelligible

standardscommon APIs

licensing

standardscommon

metadata

change managementversioning

packaging

Machine actionable

Machine actionable

Reproducibility Framework

submit articleand move onhellip

Reporting

Documentation

Provenance ndashThick Trace Data

to Distilled Reporting

Distillation and

Summarisation

Alper P et al LabelFlow Exploiting Workflow Provenance to Surface Scientific Data Provenance IPAW 2014 84-96

Reproduce by ReadingArchived Record Retain the ProcessCode

The IT Crowd Series 3 Episode 4

The eLab Virtual Machine (or Docker Image)

a black box thoughdockercom

Reproduce by Running Active InstrumentRetain the bits

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

Shared Repository

Personal Notebook

Community Registry

Publishing Resource

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

standardsAdobe

UCFORE PROVODF

formats

api

Instrument

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Instrument

NISO-JATS

Instrument

J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013

Platform profiles

NISO-JATS

Instrument

Container

Manifest

OMEX archive

httpsresearchobjectgithubiospecificationsbundle

Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369

Retro-Fitted ROsusing off the shelf

platforms

Method Matters

Reproducibility Smarts

Commons not Repository

Research Tardis

Retro-fit ROs

Do As Little As Possible

Make -gt Born

Native RO platforms

RARE amp FAIR Knowledge Turns Means Research Objects

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researchers

Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

PI Team

RARE Research Reality Check

RARE Research Reality Check

Tribal Behaviour

raquo Gangs share but not with the public

raquo Tribal behaviours rsaquo Modellers share more than Experimentalists

rsaquo Experimentalists reuse models more than Modellers

raquo Trading behavioursrsaquo Collaboration ndash complementarity

correlations

raquo Structured consortia less likely to publicly share than individuals

raquo Post-hoc rationalised DataModel Cycles

[Garza 2014]

raquo Fluid transient collaborations gt ldquomy gangrdquo management

raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity

raquo Class captains (prefects)

raquo Get the cool kids on board

raquo Head teacher leadership

[Garza 2014]

Playground Rules

Trace Data

27032015 74

me

ME

my team

closecolleagues

peers

The Research Release Creep Spiral

raquo Data Hugging amp Flirting

raquo Reciprocity norms

raquo Hans W request

raquo Dowry phenomenon

raquo Private installations

raquo Private spaces on shared installations

raquo Safe havens

Too ugly to show anyone else

Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it

The code is too sophisticated for most readersrefereesI didnt work out all the details

I didnt actually write the code -- my student did

My competitors would be unfair to me

Its valuable intellectual property

It would make papers much longer

Referees would never agree to check the code

My code invokes other code with unpublished (proprietary) code

Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011

Drivers

love money

fame duty

fear timeeffort

shame duty

[Apologies to Resnick and Malone]

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Interface Framingraquo Limited scheduled sharing choices

rsaquo Never say never

raquo ldquoCitablerdquo not ldquoSharedrdquo

raquo Feedback

rsaquo Guilt tripping

rsaquo Outlier finger pointing

[Garzia]

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipResearch Currencies

ldquoResearchBitCoinrdquo

Citation Semantics

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

Page 7: I conference2015 goble-finalupload

Bramhall et al QUALITY OF METHODS REPORTING IN ANIMAL MODELS OF COLITIS Inflammatory Bowel Diseases 2015

ldquoOnly one of the 58 papers reported all essential criteria on our checklist Animal age gender housing conditions and mortalitymorbidity were all poorly reportedhelliprdquo

httpwwwnaturecomnewsmale-researchers-stress-out-rodents-115106

ldquoAn article about computational science in a scientific publication is not the scholarship itself it is merely advertising of the scholarship The actual scholarship is the complete software development environment [the complete data] and the complete set of instructions which generated the figuresrdquo

David Donoho ldquoWavelab and Reproducible Researchrdquo 1995

Datasets Data collectionsStandard operating proceduresSoftware algorithmsConfigurations Tools and apps servicesCodes code librariesWorkflows scriptsSystem software Infrastructure Compilers hardware

Morin et al Shining Light into Black Boxes Science 2012 336(6078) 159-160 Ince et al The case for open computer programs Nature 482 2012

50 papers randomly chosen from 378

manuscripts in 2011 that use Burrows Wheeler Aligner for mapping Illumina reads

31 no sw version parameters exact

version of genomic reference sequence

26 no access to primary data sets

Nekrutenko amp Taylor Next-generation sequencing data interpretation enhancing reproducibility and accessibility Nature Genetics 13 (2012)

Broken software Broken science

raquo Geoffrey Chang Scripps Institute

raquo Homemade data-analysis program inherited from another lab

raquo Flipped two columns of data inverting the electron-density map used to derive protein structure

raquo Retract 3 Science papers and 2 papers in other journals

raquo One paper cited by 364 The structures of MsbA (purple) and Sav1866 (green) overlap little (left) until MsbA is inverted (right)

Miller A Scientists Nightmare Software Problem Leads to Five Retractions Science 22 December 2006 vol 314 no 5807 1856-1857httpwwwsoftwareacukblog2014-12-04-its-impossible-conduct-research-without-software-say-7-out-10-uk-researchers

Software making practices

ldquoAs a general rule

researchers do not

test or document their

programs rigorously

and they rarely

release their codes

making it almost

impossible to

reproduce and verify

published results

generated by

scientific softwarerdquo

2000 scientists JE Hannay et al ldquoHow Do Scientists Develop and Use Scientific Softwarerdquo Proc ICSE Workshop Software Eng for Computational Science and Eng 2009 pp 1ndash8

republic of science

regulation of science

institution cores libraries

Mertonrsquos four norms of scientific behaviour (1942)

public services

Tools StandardsMachine actionableFormats Reporting Policies Practices

Record and AutomateEverything

Potential Trace Heaven Folks

recomputationorg

sciencecodemanifestoorg

Honest Error Science is messy

Inherent

ReinhartRogoff Austerity economicsThomas Herndon

Nature Oct rsquo12

Zoeuml Corbyn

Fraud

ldquoI canrsquot immediately reproduce the research in my own laboratory It took an estimated 280 hours for an average user to approximately reproduce the paperrdquo

Prof Phil BourneAssociate Director NIH Big Data 2 Knowledge Program

When research goes ldquowrongrdquo

raquo Tainted resources

raquo Black boxes

raquo Poor Reporting

raquo Unavailable resources results data software

raquo Bad maths

raquo Sins of omission

raquo Poor training sloppiness

httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)

Ioannidis Why Most Published Research Findings Are False August 2005Joppa et alTroubling Trends in Scientific Software Use SCIENCE 340 May 2013

Scientific method

Social environmentraquo Impact factor mania

raquo Pressure to publish

raquo Broken peer review

raquo Research never reported

raquo Disorganisation

raquo Time pressures

raquo Prep amp curate costs

When research goes ldquowrongrdquo

httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)

Morrison

Do a Replication Study No thanks Not FAIR

Hard Resource intensiveUnrecognised TrolledJust gathering the bits together

Cross-Institutional e-Laboratory Fragmentation

Scattered parts Subject specific General resources

101 Innovations in Scholarly Communication - the Changing Research Workflow Boseman and Kramer 2015 httpfigsharecomarticles101_Innovations_in_Scholarly_Communication_the_Changing_Research_Workflow1286826

Process at ScaleMore on Models

httpsdoiorg1015490seek1investigation56

[Snoep 2015]

httpsdoiorg1015490seek1investigation56

Personal Data

Local Stores

External

Databases

Articles

Models

Standards

SOPs

Aggregated Commons Infrastructure

Consistent Comparative Reporting

Design protocols samples software modelshellip

httpwwwseek4scienceorg httpwwwfair-domorg httpisatoolsorg

Pop-Up Start UpsLittle Science within Big Science

How do Scientists Collaborate amp Cooperatively ExchangeCautiously Its all about The Trust

Extrinsic Driver

How do you get Scientists and Developers to work together Socially Its all about The Trust

Jam today Jam tomorrow Jam for all Just enough Jam Just in Time not Just in Case

Research Objects

Compound Interconnected Investigations Research Products

Multi-variousProductsPlatformsResources

Units of exchange commons contextual metadata

httpwwwresearchobjectorg

httpwwwresearchobjectorg

First class citizens - data software methods - id manage credit track profile focus

A Framework to Bundle and Link (scattered) resources related experiments Metadata Objects that carry Research Context

Research Objects

Bigger on the inside than the outside

Contentbull closed lt-gt open bull local lt-gt alienbull embed lt-gt referbull fixed lt-gt fluidbull nestedbull cite resolve steward

Contributionsbull multi ndashtyped stewarded

sited authoredbull span research researchers

platforms timebull cite resolve steward

Identity + Minimal Provenance

RO Resolution and Citation

rsaquo Defend it (snapshot)

rsaquo Locate it (most recent)

rsaquo Reuse it (a version a component)

rsaquo Credit it (contributory authorship)

rsaquo Cross link it (connections)

Biological Study Records (eg PRIDE) stableBiological Knowledge (eg UNIPROT) evolving

Goble De Roure Bechhofer Accelerating Knowledge Turns I3CK 2013

means

ends

driver

Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data Farr Research Object Commons

STELAR Asthma e-Lab Study Team for Early Life Asthma Research

Platform exchange ClinicalCodesorg coded patient cohorts exchange with NHS FARSITE system

STELAR e-Lab

Platform 1

Platform 2

Platform 3

A multi-site collaboration to support safe use of patient and research data for medical research

Research Object CurrencyCohort Studies

Focus on methods models workflows scripts software data figureshellip

Research Object Pivots and Profiles

Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation

R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482

Simply data + code

Can change the definition of

a figure and ultimately the

journal article

Colomb J and Brembs B

Sub-strains of Drosophila Canton-S differ

markedly in their locomotor behavior [v1

ref status indexed httpf1000res3is]

F1000Research 2014 3176

Other labs can replicate the study or

contribute their data to a meta-

analysis or disease model - figure

automatically updates

Data updates time-stamped

New conclusions added via versions

Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012

Software-like Release paradigm Not a static document paradigm

Reproduce looks backwards -gt Release looks forwards

raquo Science methods data change -gt agile evolution

raquo Comparisons versions forks amp merges dependencies

raquo Id amp Citations

raquo Interlinked ROs

[McEntyre]

Retrospective Release Research Object

The ROs Meme

recompute

replicatererun

repeat

re-examine

repurpose

recreate

reuse

restorereconstruct review

regeneraterevise

recycle

redo

What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo

ldquoshow A is true by doing Brdquo

verify but not falsify[Yong Nature 485 2012]

robustness tolerance

verification compliance

validation assurance

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

1 Science Changes So does the Lab

ldquoThe questions donrsquot

change but the

answers dordquoDan Reed

The lab is not fixedUpdated resources

UncertaintyBioSTIF

Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012

2 Instruments Break Labs Decaymaterials become unavailable technicians leave

Reproducibility Window

raquo Bit rot Black boxes

raquo Proprietary Licenses

raquo Clown services

raquo Partial replication

raquo Prepare to Repair

rsaquo form or function

rsaquo preserve or sustain

Jason Scott

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Methods(techniques algorithms

spec of the steps)

Materials(datasets parameters

algorithm seeds)

Experiment

Instruments(codes services scripts

underlying libraries)

Laboratory(sw and hw infrastructure

systems software

integrative platforms)

Setup

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Research Environment

submit articleand move onhellip

publish articlePublication Environment

Research Environment

publish articlePublication Environment

submit articleand move onhellip

[Adapted Freire 2013]

transparencydependencies

steps featuresprovenance trace

portability

robustness

preservation

accessavailable

descriptionintelligible

standardscommon APIs

licensing

standardscommon

metadata

change managementversioning

packaging

Machine actionable

Machine actionable

Reproducibility Framework

submit articleand move onhellip

Reporting

Documentation

Provenance ndashThick Trace Data

to Distilled Reporting

Distillation and

Summarisation

Alper P et al LabelFlow Exploiting Workflow Provenance to Surface Scientific Data Provenance IPAW 2014 84-96

Reproduce by ReadingArchived Record Retain the ProcessCode

The IT Crowd Series 3 Episode 4

The eLab Virtual Machine (or Docker Image)

a black box thoughdockercom

Reproduce by Running Active InstrumentRetain the bits

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

Shared Repository

Personal Notebook

Community Registry

Publishing Resource

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

standardsAdobe

UCFORE PROVODF

formats

api

Instrument

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Instrument

NISO-JATS

Instrument

J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013

Platform profiles

NISO-JATS

Instrument

Container

Manifest

OMEX archive

httpsresearchobjectgithubiospecificationsbundle

Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369

Retro-Fitted ROsusing off the shelf

platforms

Method Matters

Reproducibility Smarts

Commons not Repository

Research Tardis

Retro-fit ROs

Do As Little As Possible

Make -gt Born

Native RO platforms

RARE amp FAIR Knowledge Turns Means Research Objects

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researchers

Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

PI Team

RARE Research Reality Check

RARE Research Reality Check

Tribal Behaviour

raquo Gangs share but not with the public

raquo Tribal behaviours rsaquo Modellers share more than Experimentalists

rsaquo Experimentalists reuse models more than Modellers

raquo Trading behavioursrsaquo Collaboration ndash complementarity

correlations

raquo Structured consortia less likely to publicly share than individuals

raquo Post-hoc rationalised DataModel Cycles

[Garza 2014]

raquo Fluid transient collaborations gt ldquomy gangrdquo management

raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity

raquo Class captains (prefects)

raquo Get the cool kids on board

raquo Head teacher leadership

[Garza 2014]

Playground Rules

Trace Data

27032015 74

me

ME

my team

closecolleagues

peers

The Research Release Creep Spiral

raquo Data Hugging amp Flirting

raquo Reciprocity norms

raquo Hans W request

raquo Dowry phenomenon

raquo Private installations

raquo Private spaces on shared installations

raquo Safe havens

Too ugly to show anyone else

Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it

The code is too sophisticated for most readersrefereesI didnt work out all the details

I didnt actually write the code -- my student did

My competitors would be unfair to me

Its valuable intellectual property

It would make papers much longer

Referees would never agree to check the code

My code invokes other code with unpublished (proprietary) code

Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011

Drivers

love money

fame duty

fear timeeffort

shame duty

[Apologies to Resnick and Malone]

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Interface Framingraquo Limited scheduled sharing choices

rsaquo Never say never

raquo ldquoCitablerdquo not ldquoSharedrdquo

raquo Feedback

rsaquo Guilt tripping

rsaquo Outlier finger pointing

[Garzia]

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipResearch Currencies

ldquoResearchBitCoinrdquo

Citation Semantics

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

Page 8: I conference2015 goble-finalupload

ldquoAn article about computational science in a scientific publication is not the scholarship itself it is merely advertising of the scholarship The actual scholarship is the complete software development environment [the complete data] and the complete set of instructions which generated the figuresrdquo

David Donoho ldquoWavelab and Reproducible Researchrdquo 1995

Datasets Data collectionsStandard operating proceduresSoftware algorithmsConfigurations Tools and apps servicesCodes code librariesWorkflows scriptsSystem software Infrastructure Compilers hardware

Morin et al Shining Light into Black Boxes Science 2012 336(6078) 159-160 Ince et al The case for open computer programs Nature 482 2012

50 papers randomly chosen from 378

manuscripts in 2011 that use Burrows Wheeler Aligner for mapping Illumina reads

31 no sw version parameters exact

version of genomic reference sequence

26 no access to primary data sets

Nekrutenko amp Taylor Next-generation sequencing data interpretation enhancing reproducibility and accessibility Nature Genetics 13 (2012)

Broken software Broken science

raquo Geoffrey Chang Scripps Institute

raquo Homemade data-analysis program inherited from another lab

raquo Flipped two columns of data inverting the electron-density map used to derive protein structure

raquo Retract 3 Science papers and 2 papers in other journals

raquo One paper cited by 364 The structures of MsbA (purple) and Sav1866 (green) overlap little (left) until MsbA is inverted (right)

Miller A Scientists Nightmare Software Problem Leads to Five Retractions Science 22 December 2006 vol 314 no 5807 1856-1857httpwwwsoftwareacukblog2014-12-04-its-impossible-conduct-research-without-software-say-7-out-10-uk-researchers

Software making practices

ldquoAs a general rule

researchers do not

test or document their

programs rigorously

and they rarely

release their codes

making it almost

impossible to

reproduce and verify

published results

generated by

scientific softwarerdquo

2000 scientists JE Hannay et al ldquoHow Do Scientists Develop and Use Scientific Softwarerdquo Proc ICSE Workshop Software Eng for Computational Science and Eng 2009 pp 1ndash8

republic of science

regulation of science

institution cores libraries

Mertonrsquos four norms of scientific behaviour (1942)

public services

Tools StandardsMachine actionableFormats Reporting Policies Practices

Record and AutomateEverything

Potential Trace Heaven Folks

recomputationorg

sciencecodemanifestoorg

Honest Error Science is messy

Inherent

ReinhartRogoff Austerity economicsThomas Herndon

Nature Oct rsquo12

Zoeuml Corbyn

Fraud

ldquoI canrsquot immediately reproduce the research in my own laboratory It took an estimated 280 hours for an average user to approximately reproduce the paperrdquo

Prof Phil BourneAssociate Director NIH Big Data 2 Knowledge Program

When research goes ldquowrongrdquo

raquo Tainted resources

raquo Black boxes

raquo Poor Reporting

raquo Unavailable resources results data software

raquo Bad maths

raquo Sins of omission

raquo Poor training sloppiness

httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)

Ioannidis Why Most Published Research Findings Are False August 2005Joppa et alTroubling Trends in Scientific Software Use SCIENCE 340 May 2013

Scientific method

Social environmentraquo Impact factor mania

raquo Pressure to publish

raquo Broken peer review

raquo Research never reported

raquo Disorganisation

raquo Time pressures

raquo Prep amp curate costs

When research goes ldquowrongrdquo

httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)

Morrison

Do a Replication Study No thanks Not FAIR

Hard Resource intensiveUnrecognised TrolledJust gathering the bits together

Cross-Institutional e-Laboratory Fragmentation

Scattered parts Subject specific General resources

101 Innovations in Scholarly Communication - the Changing Research Workflow Boseman and Kramer 2015 httpfigsharecomarticles101_Innovations_in_Scholarly_Communication_the_Changing_Research_Workflow1286826

Process at ScaleMore on Models

httpsdoiorg1015490seek1investigation56

[Snoep 2015]

httpsdoiorg1015490seek1investigation56

Personal Data

Local Stores

External

Databases

Articles

Models

Standards

SOPs

Aggregated Commons Infrastructure

Consistent Comparative Reporting

Design protocols samples software modelshellip

httpwwwseek4scienceorg httpwwwfair-domorg httpisatoolsorg

Pop-Up Start UpsLittle Science within Big Science

How do Scientists Collaborate amp Cooperatively ExchangeCautiously Its all about The Trust

Extrinsic Driver

How do you get Scientists and Developers to work together Socially Its all about The Trust

Jam today Jam tomorrow Jam for all Just enough Jam Just in Time not Just in Case

Research Objects

Compound Interconnected Investigations Research Products

Multi-variousProductsPlatformsResources

Units of exchange commons contextual metadata

httpwwwresearchobjectorg

httpwwwresearchobjectorg

First class citizens - data software methods - id manage credit track profile focus

A Framework to Bundle and Link (scattered) resources related experiments Metadata Objects that carry Research Context

Research Objects

Bigger on the inside than the outside

Contentbull closed lt-gt open bull local lt-gt alienbull embed lt-gt referbull fixed lt-gt fluidbull nestedbull cite resolve steward

Contributionsbull multi ndashtyped stewarded

sited authoredbull span research researchers

platforms timebull cite resolve steward

Identity + Minimal Provenance

RO Resolution and Citation

rsaquo Defend it (snapshot)

rsaquo Locate it (most recent)

rsaquo Reuse it (a version a component)

rsaquo Credit it (contributory authorship)

rsaquo Cross link it (connections)

Biological Study Records (eg PRIDE) stableBiological Knowledge (eg UNIPROT) evolving

Goble De Roure Bechhofer Accelerating Knowledge Turns I3CK 2013

means

ends

driver

Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data Farr Research Object Commons

STELAR Asthma e-Lab Study Team for Early Life Asthma Research

Platform exchange ClinicalCodesorg coded patient cohorts exchange with NHS FARSITE system

STELAR e-Lab

Platform 1

Platform 2

Platform 3

A multi-site collaboration to support safe use of patient and research data for medical research

Research Object CurrencyCohort Studies

Focus on methods models workflows scripts software data figureshellip

Research Object Pivots and Profiles

Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation

R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482

Simply data + code

Can change the definition of

a figure and ultimately the

journal article

Colomb J and Brembs B

Sub-strains of Drosophila Canton-S differ

markedly in their locomotor behavior [v1

ref status indexed httpf1000res3is]

F1000Research 2014 3176

Other labs can replicate the study or

contribute their data to a meta-

analysis or disease model - figure

automatically updates

Data updates time-stamped

New conclusions added via versions

Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012

Software-like Release paradigm Not a static document paradigm

Reproduce looks backwards -gt Release looks forwards

raquo Science methods data change -gt agile evolution

raquo Comparisons versions forks amp merges dependencies

raquo Id amp Citations

raquo Interlinked ROs

[McEntyre]

Retrospective Release Research Object

The ROs Meme

recompute

replicatererun

repeat

re-examine

repurpose

recreate

reuse

restorereconstruct review

regeneraterevise

recycle

redo

What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo

ldquoshow A is true by doing Brdquo

verify but not falsify[Yong Nature 485 2012]

robustness tolerance

verification compliance

validation assurance

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

1 Science Changes So does the Lab

ldquoThe questions donrsquot

change but the

answers dordquoDan Reed

The lab is not fixedUpdated resources

UncertaintyBioSTIF

Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012

2 Instruments Break Labs Decaymaterials become unavailable technicians leave

Reproducibility Window

raquo Bit rot Black boxes

raquo Proprietary Licenses

raquo Clown services

raquo Partial replication

raquo Prepare to Repair

rsaquo form or function

rsaquo preserve or sustain

Jason Scott

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Methods(techniques algorithms

spec of the steps)

Materials(datasets parameters

algorithm seeds)

Experiment

Instruments(codes services scripts

underlying libraries)

Laboratory(sw and hw infrastructure

systems software

integrative platforms)

Setup

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Research Environment

submit articleand move onhellip

publish articlePublication Environment

Research Environment

publish articlePublication Environment

submit articleand move onhellip

[Adapted Freire 2013]

transparencydependencies

steps featuresprovenance trace

portability

robustness

preservation

accessavailable

descriptionintelligible

standardscommon APIs

licensing

standardscommon

metadata

change managementversioning

packaging

Machine actionable

Machine actionable

Reproducibility Framework

submit articleand move onhellip

Reporting

Documentation

Provenance ndashThick Trace Data

to Distilled Reporting

Distillation and

Summarisation

Alper P et al LabelFlow Exploiting Workflow Provenance to Surface Scientific Data Provenance IPAW 2014 84-96

Reproduce by ReadingArchived Record Retain the ProcessCode

The IT Crowd Series 3 Episode 4

The eLab Virtual Machine (or Docker Image)

a black box thoughdockercom

Reproduce by Running Active InstrumentRetain the bits

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

Shared Repository

Personal Notebook

Community Registry

Publishing Resource

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

standardsAdobe

UCFORE PROVODF

formats

api

Instrument

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Instrument

NISO-JATS

Instrument

J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013

Platform profiles

NISO-JATS

Instrument

Container

Manifest

OMEX archive

httpsresearchobjectgithubiospecificationsbundle

Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369

Retro-Fitted ROsusing off the shelf

platforms

Method Matters

Reproducibility Smarts

Commons not Repository

Research Tardis

Retro-fit ROs

Do As Little As Possible

Make -gt Born

Native RO platforms

RARE amp FAIR Knowledge Turns Means Research Objects

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researchers

Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

PI Team

RARE Research Reality Check

RARE Research Reality Check

Tribal Behaviour

raquo Gangs share but not with the public

raquo Tribal behaviours rsaquo Modellers share more than Experimentalists

rsaquo Experimentalists reuse models more than Modellers

raquo Trading behavioursrsaquo Collaboration ndash complementarity

correlations

raquo Structured consortia less likely to publicly share than individuals

raquo Post-hoc rationalised DataModel Cycles

[Garza 2014]

raquo Fluid transient collaborations gt ldquomy gangrdquo management

raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity

raquo Class captains (prefects)

raquo Get the cool kids on board

raquo Head teacher leadership

[Garza 2014]

Playground Rules

Trace Data

27032015 74

me

ME

my team

closecolleagues

peers

The Research Release Creep Spiral

raquo Data Hugging amp Flirting

raquo Reciprocity norms

raquo Hans W request

raquo Dowry phenomenon

raquo Private installations

raquo Private spaces on shared installations

raquo Safe havens

Too ugly to show anyone else

Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it

The code is too sophisticated for most readersrefereesI didnt work out all the details

I didnt actually write the code -- my student did

My competitors would be unfair to me

Its valuable intellectual property

It would make papers much longer

Referees would never agree to check the code

My code invokes other code with unpublished (proprietary) code

Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011

Drivers

love money

fame duty

fear timeeffort

shame duty

[Apologies to Resnick and Malone]

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Interface Framingraquo Limited scheduled sharing choices

rsaquo Never say never

raquo ldquoCitablerdquo not ldquoSharedrdquo

raquo Feedback

rsaquo Guilt tripping

rsaquo Outlier finger pointing

[Garzia]

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipResearch Currencies

ldquoResearchBitCoinrdquo

Citation Semantics

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

Page 9: I conference2015 goble-finalupload

Broken software Broken science

raquo Geoffrey Chang Scripps Institute

raquo Homemade data-analysis program inherited from another lab

raquo Flipped two columns of data inverting the electron-density map used to derive protein structure

raquo Retract 3 Science papers and 2 papers in other journals

raquo One paper cited by 364 The structures of MsbA (purple) and Sav1866 (green) overlap little (left) until MsbA is inverted (right)

Miller A Scientists Nightmare Software Problem Leads to Five Retractions Science 22 December 2006 vol 314 no 5807 1856-1857httpwwwsoftwareacukblog2014-12-04-its-impossible-conduct-research-without-software-say-7-out-10-uk-researchers

Software making practices

ldquoAs a general rule

researchers do not

test or document their

programs rigorously

and they rarely

release their codes

making it almost

impossible to

reproduce and verify

published results

generated by

scientific softwarerdquo

2000 scientists JE Hannay et al ldquoHow Do Scientists Develop and Use Scientific Softwarerdquo Proc ICSE Workshop Software Eng for Computational Science and Eng 2009 pp 1ndash8

republic of science

regulation of science

institution cores libraries

Mertonrsquos four norms of scientific behaviour (1942)

public services

Tools StandardsMachine actionableFormats Reporting Policies Practices

Record and AutomateEverything

Potential Trace Heaven Folks

recomputationorg

sciencecodemanifestoorg

Honest Error Science is messy

Inherent

ReinhartRogoff Austerity economicsThomas Herndon

Nature Oct rsquo12

Zoeuml Corbyn

Fraud

ldquoI canrsquot immediately reproduce the research in my own laboratory It took an estimated 280 hours for an average user to approximately reproduce the paperrdquo

Prof Phil BourneAssociate Director NIH Big Data 2 Knowledge Program

When research goes ldquowrongrdquo

raquo Tainted resources

raquo Black boxes

raquo Poor Reporting

raquo Unavailable resources results data software

raquo Bad maths

raquo Sins of omission

raquo Poor training sloppiness

httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)

Ioannidis Why Most Published Research Findings Are False August 2005Joppa et alTroubling Trends in Scientific Software Use SCIENCE 340 May 2013

Scientific method

Social environmentraquo Impact factor mania

raquo Pressure to publish

raquo Broken peer review

raquo Research never reported

raquo Disorganisation

raquo Time pressures

raquo Prep amp curate costs

When research goes ldquowrongrdquo

httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)

Morrison

Do a Replication Study No thanks Not FAIR

Hard Resource intensiveUnrecognised TrolledJust gathering the bits together

Cross-Institutional e-Laboratory Fragmentation

Scattered parts Subject specific General resources

101 Innovations in Scholarly Communication - the Changing Research Workflow Boseman and Kramer 2015 httpfigsharecomarticles101_Innovations_in_Scholarly_Communication_the_Changing_Research_Workflow1286826

Process at ScaleMore on Models

httpsdoiorg1015490seek1investigation56

[Snoep 2015]

httpsdoiorg1015490seek1investigation56

Personal Data

Local Stores

External

Databases

Articles

Models

Standards

SOPs

Aggregated Commons Infrastructure

Consistent Comparative Reporting

Design protocols samples software modelshellip

httpwwwseek4scienceorg httpwwwfair-domorg httpisatoolsorg

Pop-Up Start UpsLittle Science within Big Science

How do Scientists Collaborate amp Cooperatively ExchangeCautiously Its all about The Trust

Extrinsic Driver

How do you get Scientists and Developers to work together Socially Its all about The Trust

Jam today Jam tomorrow Jam for all Just enough Jam Just in Time not Just in Case

Research Objects

Compound Interconnected Investigations Research Products

Multi-variousProductsPlatformsResources

Units of exchange commons contextual metadata

httpwwwresearchobjectorg

httpwwwresearchobjectorg

First class citizens - data software methods - id manage credit track profile focus

A Framework to Bundle and Link (scattered) resources related experiments Metadata Objects that carry Research Context

Research Objects

Bigger on the inside than the outside

Contentbull closed lt-gt open bull local lt-gt alienbull embed lt-gt referbull fixed lt-gt fluidbull nestedbull cite resolve steward

Contributionsbull multi ndashtyped stewarded

sited authoredbull span research researchers

platforms timebull cite resolve steward

Identity + Minimal Provenance

RO Resolution and Citation

rsaquo Defend it (snapshot)

rsaquo Locate it (most recent)

rsaquo Reuse it (a version a component)

rsaquo Credit it (contributory authorship)

rsaquo Cross link it (connections)

Biological Study Records (eg PRIDE) stableBiological Knowledge (eg UNIPROT) evolving

Goble De Roure Bechhofer Accelerating Knowledge Turns I3CK 2013

means

ends

driver

Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data Farr Research Object Commons

STELAR Asthma e-Lab Study Team for Early Life Asthma Research

Platform exchange ClinicalCodesorg coded patient cohorts exchange with NHS FARSITE system

STELAR e-Lab

Platform 1

Platform 2

Platform 3

A multi-site collaboration to support safe use of patient and research data for medical research

Research Object CurrencyCohort Studies

Focus on methods models workflows scripts software data figureshellip

Research Object Pivots and Profiles

Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation

R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482

Simply data + code

Can change the definition of

a figure and ultimately the

journal article

Colomb J and Brembs B

Sub-strains of Drosophila Canton-S differ

markedly in their locomotor behavior [v1

ref status indexed httpf1000res3is]

F1000Research 2014 3176

Other labs can replicate the study or

contribute their data to a meta-

analysis or disease model - figure

automatically updates

Data updates time-stamped

New conclusions added via versions

Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012

Software-like Release paradigm Not a static document paradigm

Reproduce looks backwards -gt Release looks forwards

raquo Science methods data change -gt agile evolution

raquo Comparisons versions forks amp merges dependencies

raquo Id amp Citations

raquo Interlinked ROs

[McEntyre]

Retrospective Release Research Object

The ROs Meme

recompute

replicatererun

repeat

re-examine

repurpose

recreate

reuse

restorereconstruct review

regeneraterevise

recycle

redo

What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo

ldquoshow A is true by doing Brdquo

verify but not falsify[Yong Nature 485 2012]

robustness tolerance

verification compliance

validation assurance

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

1 Science Changes So does the Lab

ldquoThe questions donrsquot

change but the

answers dordquoDan Reed

The lab is not fixedUpdated resources

UncertaintyBioSTIF

Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012

2 Instruments Break Labs Decaymaterials become unavailable technicians leave

Reproducibility Window

raquo Bit rot Black boxes

raquo Proprietary Licenses

raquo Clown services

raquo Partial replication

raquo Prepare to Repair

rsaquo form or function

rsaquo preserve or sustain

Jason Scott

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Methods(techniques algorithms

spec of the steps)

Materials(datasets parameters

algorithm seeds)

Experiment

Instruments(codes services scripts

underlying libraries)

Laboratory(sw and hw infrastructure

systems software

integrative platforms)

Setup

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Research Environment

submit articleand move onhellip

publish articlePublication Environment

Research Environment

publish articlePublication Environment

submit articleand move onhellip

[Adapted Freire 2013]

transparencydependencies

steps featuresprovenance trace

portability

robustness

preservation

accessavailable

descriptionintelligible

standardscommon APIs

licensing

standardscommon

metadata

change managementversioning

packaging

Machine actionable

Machine actionable

Reproducibility Framework

submit articleand move onhellip

Reporting

Documentation

Provenance ndashThick Trace Data

to Distilled Reporting

Distillation and

Summarisation

Alper P et al LabelFlow Exploiting Workflow Provenance to Surface Scientific Data Provenance IPAW 2014 84-96

Reproduce by ReadingArchived Record Retain the ProcessCode

The IT Crowd Series 3 Episode 4

The eLab Virtual Machine (or Docker Image)

a black box thoughdockercom

Reproduce by Running Active InstrumentRetain the bits

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

Shared Repository

Personal Notebook

Community Registry

Publishing Resource

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

standardsAdobe

UCFORE PROVODF

formats

api

Instrument

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Instrument

NISO-JATS

Instrument

J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013

Platform profiles

NISO-JATS

Instrument

Container

Manifest

OMEX archive

httpsresearchobjectgithubiospecificationsbundle

Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369

Retro-Fitted ROsusing off the shelf

platforms

Method Matters

Reproducibility Smarts

Commons not Repository

Research Tardis

Retro-fit ROs

Do As Little As Possible

Make -gt Born

Native RO platforms

RARE amp FAIR Knowledge Turns Means Research Objects

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researchers

Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

PI Team

RARE Research Reality Check

RARE Research Reality Check

Tribal Behaviour

raquo Gangs share but not with the public

raquo Tribal behaviours rsaquo Modellers share more than Experimentalists

rsaquo Experimentalists reuse models more than Modellers

raquo Trading behavioursrsaquo Collaboration ndash complementarity

correlations

raquo Structured consortia less likely to publicly share than individuals

raquo Post-hoc rationalised DataModel Cycles

[Garza 2014]

raquo Fluid transient collaborations gt ldquomy gangrdquo management

raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity

raquo Class captains (prefects)

raquo Get the cool kids on board

raquo Head teacher leadership

[Garza 2014]

Playground Rules

Trace Data

27032015 74

me

ME

my team

closecolleagues

peers

The Research Release Creep Spiral

raquo Data Hugging amp Flirting

raquo Reciprocity norms

raquo Hans W request

raquo Dowry phenomenon

raquo Private installations

raquo Private spaces on shared installations

raquo Safe havens

Too ugly to show anyone else

Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it

The code is too sophisticated for most readersrefereesI didnt work out all the details

I didnt actually write the code -- my student did

My competitors would be unfair to me

Its valuable intellectual property

It would make papers much longer

Referees would never agree to check the code

My code invokes other code with unpublished (proprietary) code

Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011

Drivers

love money

fame duty

fear timeeffort

shame duty

[Apologies to Resnick and Malone]

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Interface Framingraquo Limited scheduled sharing choices

rsaquo Never say never

raquo ldquoCitablerdquo not ldquoSharedrdquo

raquo Feedback

rsaquo Guilt tripping

rsaquo Outlier finger pointing

[Garzia]

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipResearch Currencies

ldquoResearchBitCoinrdquo

Citation Semantics

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

Page 10: I conference2015 goble-finalupload

Software making practices

ldquoAs a general rule

researchers do not

test or document their

programs rigorously

and they rarely

release their codes

making it almost

impossible to

reproduce and verify

published results

generated by

scientific softwarerdquo

2000 scientists JE Hannay et al ldquoHow Do Scientists Develop and Use Scientific Softwarerdquo Proc ICSE Workshop Software Eng for Computational Science and Eng 2009 pp 1ndash8

republic of science

regulation of science

institution cores libraries

Mertonrsquos four norms of scientific behaviour (1942)

public services

Tools StandardsMachine actionableFormats Reporting Policies Practices

Record and AutomateEverything

Potential Trace Heaven Folks

recomputationorg

sciencecodemanifestoorg

Honest Error Science is messy

Inherent

ReinhartRogoff Austerity economicsThomas Herndon

Nature Oct rsquo12

Zoeuml Corbyn

Fraud

ldquoI canrsquot immediately reproduce the research in my own laboratory It took an estimated 280 hours for an average user to approximately reproduce the paperrdquo

Prof Phil BourneAssociate Director NIH Big Data 2 Knowledge Program

When research goes ldquowrongrdquo

raquo Tainted resources

raquo Black boxes

raquo Poor Reporting

raquo Unavailable resources results data software

raquo Bad maths

raquo Sins of omission

raquo Poor training sloppiness

httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)

Ioannidis Why Most Published Research Findings Are False August 2005Joppa et alTroubling Trends in Scientific Software Use SCIENCE 340 May 2013

Scientific method

Social environmentraquo Impact factor mania

raquo Pressure to publish

raquo Broken peer review

raquo Research never reported

raquo Disorganisation

raquo Time pressures

raquo Prep amp curate costs

When research goes ldquowrongrdquo

httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)

Morrison

Do a Replication Study No thanks Not FAIR

Hard Resource intensiveUnrecognised TrolledJust gathering the bits together

Cross-Institutional e-Laboratory Fragmentation

Scattered parts Subject specific General resources

101 Innovations in Scholarly Communication - the Changing Research Workflow Boseman and Kramer 2015 httpfigsharecomarticles101_Innovations_in_Scholarly_Communication_the_Changing_Research_Workflow1286826

Process at ScaleMore on Models

httpsdoiorg1015490seek1investigation56

[Snoep 2015]

httpsdoiorg1015490seek1investigation56

Personal Data

Local Stores

External

Databases

Articles

Models

Standards

SOPs

Aggregated Commons Infrastructure

Consistent Comparative Reporting

Design protocols samples software modelshellip

httpwwwseek4scienceorg httpwwwfair-domorg httpisatoolsorg

Pop-Up Start UpsLittle Science within Big Science

How do Scientists Collaborate amp Cooperatively ExchangeCautiously Its all about The Trust

Extrinsic Driver

How do you get Scientists and Developers to work together Socially Its all about The Trust

Jam today Jam tomorrow Jam for all Just enough Jam Just in Time not Just in Case

Research Objects

Compound Interconnected Investigations Research Products

Multi-variousProductsPlatformsResources

Units of exchange commons contextual metadata

httpwwwresearchobjectorg

httpwwwresearchobjectorg

First class citizens - data software methods - id manage credit track profile focus

A Framework to Bundle and Link (scattered) resources related experiments Metadata Objects that carry Research Context

Research Objects

Bigger on the inside than the outside

Contentbull closed lt-gt open bull local lt-gt alienbull embed lt-gt referbull fixed lt-gt fluidbull nestedbull cite resolve steward

Contributionsbull multi ndashtyped stewarded

sited authoredbull span research researchers

platforms timebull cite resolve steward

Identity + Minimal Provenance

RO Resolution and Citation

rsaquo Defend it (snapshot)

rsaquo Locate it (most recent)

rsaquo Reuse it (a version a component)

rsaquo Credit it (contributory authorship)

rsaquo Cross link it (connections)

Biological Study Records (eg PRIDE) stableBiological Knowledge (eg UNIPROT) evolving

Goble De Roure Bechhofer Accelerating Knowledge Turns I3CK 2013

means

ends

driver

Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data Farr Research Object Commons

STELAR Asthma e-Lab Study Team for Early Life Asthma Research

Platform exchange ClinicalCodesorg coded patient cohorts exchange with NHS FARSITE system

STELAR e-Lab

Platform 1

Platform 2

Platform 3

A multi-site collaboration to support safe use of patient and research data for medical research

Research Object CurrencyCohort Studies

Focus on methods models workflows scripts software data figureshellip

Research Object Pivots and Profiles

Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation

R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482

Simply data + code

Can change the definition of

a figure and ultimately the

journal article

Colomb J and Brembs B

Sub-strains of Drosophila Canton-S differ

markedly in their locomotor behavior [v1

ref status indexed httpf1000res3is]

F1000Research 2014 3176

Other labs can replicate the study or

contribute their data to a meta-

analysis or disease model - figure

automatically updates

Data updates time-stamped

New conclusions added via versions

Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012

Software-like Release paradigm Not a static document paradigm

Reproduce looks backwards -gt Release looks forwards

raquo Science methods data change -gt agile evolution

raquo Comparisons versions forks amp merges dependencies

raquo Id amp Citations

raquo Interlinked ROs

[McEntyre]

Retrospective Release Research Object

The ROs Meme

recompute

replicatererun

repeat

re-examine

repurpose

recreate

reuse

restorereconstruct review

regeneraterevise

recycle

redo

What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo

ldquoshow A is true by doing Brdquo

verify but not falsify[Yong Nature 485 2012]

robustness tolerance

verification compliance

validation assurance

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

1 Science Changes So does the Lab

ldquoThe questions donrsquot

change but the

answers dordquoDan Reed

The lab is not fixedUpdated resources

UncertaintyBioSTIF

Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012

2 Instruments Break Labs Decaymaterials become unavailable technicians leave

Reproducibility Window

raquo Bit rot Black boxes

raquo Proprietary Licenses

raquo Clown services

raquo Partial replication

raquo Prepare to Repair

rsaquo form or function

rsaquo preserve or sustain

Jason Scott

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Methods(techniques algorithms

spec of the steps)

Materials(datasets parameters

algorithm seeds)

Experiment

Instruments(codes services scripts

underlying libraries)

Laboratory(sw and hw infrastructure

systems software

integrative platforms)

Setup

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Research Environment

submit articleand move onhellip

publish articlePublication Environment

Research Environment

publish articlePublication Environment

submit articleand move onhellip

[Adapted Freire 2013]

transparencydependencies

steps featuresprovenance trace

portability

robustness

preservation

accessavailable

descriptionintelligible

standardscommon APIs

licensing

standardscommon

metadata

change managementversioning

packaging

Machine actionable

Machine actionable

Reproducibility Framework

submit articleand move onhellip

Reporting

Documentation

Provenance ndashThick Trace Data

to Distilled Reporting

Distillation and

Summarisation

Alper P et al LabelFlow Exploiting Workflow Provenance to Surface Scientific Data Provenance IPAW 2014 84-96

Reproduce by ReadingArchived Record Retain the ProcessCode

The IT Crowd Series 3 Episode 4

The eLab Virtual Machine (or Docker Image)

a black box thoughdockercom

Reproduce by Running Active InstrumentRetain the bits

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

Shared Repository

Personal Notebook

Community Registry

Publishing Resource

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

standardsAdobe

UCFORE PROVODF

formats

api

Instrument

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Instrument

NISO-JATS

Instrument

J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013

Platform profiles

NISO-JATS

Instrument

Container

Manifest

OMEX archive

httpsresearchobjectgithubiospecificationsbundle

Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369

Retro-Fitted ROsusing off the shelf

platforms

Method Matters

Reproducibility Smarts

Commons not Repository

Research Tardis

Retro-fit ROs

Do As Little As Possible

Make -gt Born

Native RO platforms

RARE amp FAIR Knowledge Turns Means Research Objects

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researchers

Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

PI Team

RARE Research Reality Check

RARE Research Reality Check

Tribal Behaviour

raquo Gangs share but not with the public

raquo Tribal behaviours rsaquo Modellers share more than Experimentalists

rsaquo Experimentalists reuse models more than Modellers

raquo Trading behavioursrsaquo Collaboration ndash complementarity

correlations

raquo Structured consortia less likely to publicly share than individuals

raquo Post-hoc rationalised DataModel Cycles

[Garza 2014]

raquo Fluid transient collaborations gt ldquomy gangrdquo management

raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity

raquo Class captains (prefects)

raquo Get the cool kids on board

raquo Head teacher leadership

[Garza 2014]

Playground Rules

Trace Data

27032015 74

me

ME

my team

closecolleagues

peers

The Research Release Creep Spiral

raquo Data Hugging amp Flirting

raquo Reciprocity norms

raquo Hans W request

raquo Dowry phenomenon

raquo Private installations

raquo Private spaces on shared installations

raquo Safe havens

Too ugly to show anyone else

Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it

The code is too sophisticated for most readersrefereesI didnt work out all the details

I didnt actually write the code -- my student did

My competitors would be unfair to me

Its valuable intellectual property

It would make papers much longer

Referees would never agree to check the code

My code invokes other code with unpublished (proprietary) code

Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011

Drivers

love money

fame duty

fear timeeffort

shame duty

[Apologies to Resnick and Malone]

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Interface Framingraquo Limited scheduled sharing choices

rsaquo Never say never

raquo ldquoCitablerdquo not ldquoSharedrdquo

raquo Feedback

rsaquo Guilt tripping

rsaquo Outlier finger pointing

[Garzia]

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipResearch Currencies

ldquoResearchBitCoinrdquo

Citation Semantics

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

Page 11: I conference2015 goble-finalupload

republic of science

regulation of science

institution cores libraries

Mertonrsquos four norms of scientific behaviour (1942)

public services

Tools StandardsMachine actionableFormats Reporting Policies Practices

Record and AutomateEverything

Potential Trace Heaven Folks

recomputationorg

sciencecodemanifestoorg

Honest Error Science is messy

Inherent

ReinhartRogoff Austerity economicsThomas Herndon

Nature Oct rsquo12

Zoeuml Corbyn

Fraud

ldquoI canrsquot immediately reproduce the research in my own laboratory It took an estimated 280 hours for an average user to approximately reproduce the paperrdquo

Prof Phil BourneAssociate Director NIH Big Data 2 Knowledge Program

When research goes ldquowrongrdquo

raquo Tainted resources

raquo Black boxes

raquo Poor Reporting

raquo Unavailable resources results data software

raquo Bad maths

raquo Sins of omission

raquo Poor training sloppiness

httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)

Ioannidis Why Most Published Research Findings Are False August 2005Joppa et alTroubling Trends in Scientific Software Use SCIENCE 340 May 2013

Scientific method

Social environmentraquo Impact factor mania

raquo Pressure to publish

raquo Broken peer review

raquo Research never reported

raquo Disorganisation

raquo Time pressures

raquo Prep amp curate costs

When research goes ldquowrongrdquo

httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)

Morrison

Do a Replication Study No thanks Not FAIR

Hard Resource intensiveUnrecognised TrolledJust gathering the bits together

Cross-Institutional e-Laboratory Fragmentation

Scattered parts Subject specific General resources

101 Innovations in Scholarly Communication - the Changing Research Workflow Boseman and Kramer 2015 httpfigsharecomarticles101_Innovations_in_Scholarly_Communication_the_Changing_Research_Workflow1286826

Process at ScaleMore on Models

httpsdoiorg1015490seek1investigation56

[Snoep 2015]

httpsdoiorg1015490seek1investigation56

Personal Data

Local Stores

External

Databases

Articles

Models

Standards

SOPs

Aggregated Commons Infrastructure

Consistent Comparative Reporting

Design protocols samples software modelshellip

httpwwwseek4scienceorg httpwwwfair-domorg httpisatoolsorg

Pop-Up Start UpsLittle Science within Big Science

How do Scientists Collaborate amp Cooperatively ExchangeCautiously Its all about The Trust

Extrinsic Driver

How do you get Scientists and Developers to work together Socially Its all about The Trust

Jam today Jam tomorrow Jam for all Just enough Jam Just in Time not Just in Case

Research Objects

Compound Interconnected Investigations Research Products

Multi-variousProductsPlatformsResources

Units of exchange commons contextual metadata

httpwwwresearchobjectorg

httpwwwresearchobjectorg

First class citizens - data software methods - id manage credit track profile focus

A Framework to Bundle and Link (scattered) resources related experiments Metadata Objects that carry Research Context

Research Objects

Bigger on the inside than the outside

Contentbull closed lt-gt open bull local lt-gt alienbull embed lt-gt referbull fixed lt-gt fluidbull nestedbull cite resolve steward

Contributionsbull multi ndashtyped stewarded

sited authoredbull span research researchers

platforms timebull cite resolve steward

Identity + Minimal Provenance

RO Resolution and Citation

rsaquo Defend it (snapshot)

rsaquo Locate it (most recent)

rsaquo Reuse it (a version a component)

rsaquo Credit it (contributory authorship)

rsaquo Cross link it (connections)

Biological Study Records (eg PRIDE) stableBiological Knowledge (eg UNIPROT) evolving

Goble De Roure Bechhofer Accelerating Knowledge Turns I3CK 2013

means

ends

driver

Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data Farr Research Object Commons

STELAR Asthma e-Lab Study Team for Early Life Asthma Research

Platform exchange ClinicalCodesorg coded patient cohorts exchange with NHS FARSITE system

STELAR e-Lab

Platform 1

Platform 2

Platform 3

A multi-site collaboration to support safe use of patient and research data for medical research

Research Object CurrencyCohort Studies

Focus on methods models workflows scripts software data figureshellip

Research Object Pivots and Profiles

Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation

R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482

Simply data + code

Can change the definition of

a figure and ultimately the

journal article

Colomb J and Brembs B

Sub-strains of Drosophila Canton-S differ

markedly in their locomotor behavior [v1

ref status indexed httpf1000res3is]

F1000Research 2014 3176

Other labs can replicate the study or

contribute their data to a meta-

analysis or disease model - figure

automatically updates

Data updates time-stamped

New conclusions added via versions

Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012

Software-like Release paradigm Not a static document paradigm

Reproduce looks backwards -gt Release looks forwards

raquo Science methods data change -gt agile evolution

raquo Comparisons versions forks amp merges dependencies

raquo Id amp Citations

raquo Interlinked ROs

[McEntyre]

Retrospective Release Research Object

The ROs Meme

recompute

replicatererun

repeat

re-examine

repurpose

recreate

reuse

restorereconstruct review

regeneraterevise

recycle

redo

What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo

ldquoshow A is true by doing Brdquo

verify but not falsify[Yong Nature 485 2012]

robustness tolerance

verification compliance

validation assurance

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

1 Science Changes So does the Lab

ldquoThe questions donrsquot

change but the

answers dordquoDan Reed

The lab is not fixedUpdated resources

UncertaintyBioSTIF

Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012

2 Instruments Break Labs Decaymaterials become unavailable technicians leave

Reproducibility Window

raquo Bit rot Black boxes

raquo Proprietary Licenses

raquo Clown services

raquo Partial replication

raquo Prepare to Repair

rsaquo form or function

rsaquo preserve or sustain

Jason Scott

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Methods(techniques algorithms

spec of the steps)

Materials(datasets parameters

algorithm seeds)

Experiment

Instruments(codes services scripts

underlying libraries)

Laboratory(sw and hw infrastructure

systems software

integrative platforms)

Setup

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Research Environment

submit articleand move onhellip

publish articlePublication Environment

Research Environment

publish articlePublication Environment

submit articleand move onhellip

[Adapted Freire 2013]

transparencydependencies

steps featuresprovenance trace

portability

robustness

preservation

accessavailable

descriptionintelligible

standardscommon APIs

licensing

standardscommon

metadata

change managementversioning

packaging

Machine actionable

Machine actionable

Reproducibility Framework

submit articleand move onhellip

Reporting

Documentation

Provenance ndashThick Trace Data

to Distilled Reporting

Distillation and

Summarisation

Alper P et al LabelFlow Exploiting Workflow Provenance to Surface Scientific Data Provenance IPAW 2014 84-96

Reproduce by ReadingArchived Record Retain the ProcessCode

The IT Crowd Series 3 Episode 4

The eLab Virtual Machine (or Docker Image)

a black box thoughdockercom

Reproduce by Running Active InstrumentRetain the bits

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

Shared Repository

Personal Notebook

Community Registry

Publishing Resource

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

standardsAdobe

UCFORE PROVODF

formats

api

Instrument

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Instrument

NISO-JATS

Instrument

J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013

Platform profiles

NISO-JATS

Instrument

Container

Manifest

OMEX archive

httpsresearchobjectgithubiospecificationsbundle

Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369

Retro-Fitted ROsusing off the shelf

platforms

Method Matters

Reproducibility Smarts

Commons not Repository

Research Tardis

Retro-fit ROs

Do As Little As Possible

Make -gt Born

Native RO platforms

RARE amp FAIR Knowledge Turns Means Research Objects

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researchers

Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

PI Team

RARE Research Reality Check

RARE Research Reality Check

Tribal Behaviour

raquo Gangs share but not with the public

raquo Tribal behaviours rsaquo Modellers share more than Experimentalists

rsaquo Experimentalists reuse models more than Modellers

raquo Trading behavioursrsaquo Collaboration ndash complementarity

correlations

raquo Structured consortia less likely to publicly share than individuals

raquo Post-hoc rationalised DataModel Cycles

[Garza 2014]

raquo Fluid transient collaborations gt ldquomy gangrdquo management

raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity

raquo Class captains (prefects)

raquo Get the cool kids on board

raquo Head teacher leadership

[Garza 2014]

Playground Rules

Trace Data

27032015 74

me

ME

my team

closecolleagues

peers

The Research Release Creep Spiral

raquo Data Hugging amp Flirting

raquo Reciprocity norms

raquo Hans W request

raquo Dowry phenomenon

raquo Private installations

raquo Private spaces on shared installations

raquo Safe havens

Too ugly to show anyone else

Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it

The code is too sophisticated for most readersrefereesI didnt work out all the details

I didnt actually write the code -- my student did

My competitors would be unfair to me

Its valuable intellectual property

It would make papers much longer

Referees would never agree to check the code

My code invokes other code with unpublished (proprietary) code

Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011

Drivers

love money

fame duty

fear timeeffort

shame duty

[Apologies to Resnick and Malone]

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Interface Framingraquo Limited scheduled sharing choices

rsaquo Never say never

raquo ldquoCitablerdquo not ldquoSharedrdquo

raquo Feedback

rsaquo Guilt tripping

rsaquo Outlier finger pointing

[Garzia]

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipResearch Currencies

ldquoResearchBitCoinrdquo

Citation Semantics

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

Page 12: I conference2015 goble-finalupload

Tools StandardsMachine actionableFormats Reporting Policies Practices

Record and AutomateEverything

Potential Trace Heaven Folks

recomputationorg

sciencecodemanifestoorg

Honest Error Science is messy

Inherent

ReinhartRogoff Austerity economicsThomas Herndon

Nature Oct rsquo12

Zoeuml Corbyn

Fraud

ldquoI canrsquot immediately reproduce the research in my own laboratory It took an estimated 280 hours for an average user to approximately reproduce the paperrdquo

Prof Phil BourneAssociate Director NIH Big Data 2 Knowledge Program

When research goes ldquowrongrdquo

raquo Tainted resources

raquo Black boxes

raquo Poor Reporting

raquo Unavailable resources results data software

raquo Bad maths

raquo Sins of omission

raquo Poor training sloppiness

httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)

Ioannidis Why Most Published Research Findings Are False August 2005Joppa et alTroubling Trends in Scientific Software Use SCIENCE 340 May 2013

Scientific method

Social environmentraquo Impact factor mania

raquo Pressure to publish

raquo Broken peer review

raquo Research never reported

raquo Disorganisation

raquo Time pressures

raquo Prep amp curate costs

When research goes ldquowrongrdquo

httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)

Morrison

Do a Replication Study No thanks Not FAIR

Hard Resource intensiveUnrecognised TrolledJust gathering the bits together

Cross-Institutional e-Laboratory Fragmentation

Scattered parts Subject specific General resources

101 Innovations in Scholarly Communication - the Changing Research Workflow Boseman and Kramer 2015 httpfigsharecomarticles101_Innovations_in_Scholarly_Communication_the_Changing_Research_Workflow1286826

Process at ScaleMore on Models

httpsdoiorg1015490seek1investigation56

[Snoep 2015]

httpsdoiorg1015490seek1investigation56

Personal Data

Local Stores

External

Databases

Articles

Models

Standards

SOPs

Aggregated Commons Infrastructure

Consistent Comparative Reporting

Design protocols samples software modelshellip

httpwwwseek4scienceorg httpwwwfair-domorg httpisatoolsorg

Pop-Up Start UpsLittle Science within Big Science

How do Scientists Collaborate amp Cooperatively ExchangeCautiously Its all about The Trust

Extrinsic Driver

How do you get Scientists and Developers to work together Socially Its all about The Trust

Jam today Jam tomorrow Jam for all Just enough Jam Just in Time not Just in Case

Research Objects

Compound Interconnected Investigations Research Products

Multi-variousProductsPlatformsResources

Units of exchange commons contextual metadata

httpwwwresearchobjectorg

httpwwwresearchobjectorg

First class citizens - data software methods - id manage credit track profile focus

A Framework to Bundle and Link (scattered) resources related experiments Metadata Objects that carry Research Context

Research Objects

Bigger on the inside than the outside

Contentbull closed lt-gt open bull local lt-gt alienbull embed lt-gt referbull fixed lt-gt fluidbull nestedbull cite resolve steward

Contributionsbull multi ndashtyped stewarded

sited authoredbull span research researchers

platforms timebull cite resolve steward

Identity + Minimal Provenance

RO Resolution and Citation

rsaquo Defend it (snapshot)

rsaquo Locate it (most recent)

rsaquo Reuse it (a version a component)

rsaquo Credit it (contributory authorship)

rsaquo Cross link it (connections)

Biological Study Records (eg PRIDE) stableBiological Knowledge (eg UNIPROT) evolving

Goble De Roure Bechhofer Accelerating Knowledge Turns I3CK 2013

means

ends

driver

Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data Farr Research Object Commons

STELAR Asthma e-Lab Study Team for Early Life Asthma Research

Platform exchange ClinicalCodesorg coded patient cohorts exchange with NHS FARSITE system

STELAR e-Lab

Platform 1

Platform 2

Platform 3

A multi-site collaboration to support safe use of patient and research data for medical research

Research Object CurrencyCohort Studies

Focus on methods models workflows scripts software data figureshellip

Research Object Pivots and Profiles

Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation

R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482

Simply data + code

Can change the definition of

a figure and ultimately the

journal article

Colomb J and Brembs B

Sub-strains of Drosophila Canton-S differ

markedly in their locomotor behavior [v1

ref status indexed httpf1000res3is]

F1000Research 2014 3176

Other labs can replicate the study or

contribute their data to a meta-

analysis or disease model - figure

automatically updates

Data updates time-stamped

New conclusions added via versions

Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012

Software-like Release paradigm Not a static document paradigm

Reproduce looks backwards -gt Release looks forwards

raquo Science methods data change -gt agile evolution

raquo Comparisons versions forks amp merges dependencies

raquo Id amp Citations

raquo Interlinked ROs

[McEntyre]

Retrospective Release Research Object

The ROs Meme

recompute

replicatererun

repeat

re-examine

repurpose

recreate

reuse

restorereconstruct review

regeneraterevise

recycle

redo

What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo

ldquoshow A is true by doing Brdquo

verify but not falsify[Yong Nature 485 2012]

robustness tolerance

verification compliance

validation assurance

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

1 Science Changes So does the Lab

ldquoThe questions donrsquot

change but the

answers dordquoDan Reed

The lab is not fixedUpdated resources

UncertaintyBioSTIF

Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012

2 Instruments Break Labs Decaymaterials become unavailable technicians leave

Reproducibility Window

raquo Bit rot Black boxes

raquo Proprietary Licenses

raquo Clown services

raquo Partial replication

raquo Prepare to Repair

rsaquo form or function

rsaquo preserve or sustain

Jason Scott

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Methods(techniques algorithms

spec of the steps)

Materials(datasets parameters

algorithm seeds)

Experiment

Instruments(codes services scripts

underlying libraries)

Laboratory(sw and hw infrastructure

systems software

integrative platforms)

Setup

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Research Environment

submit articleand move onhellip

publish articlePublication Environment

Research Environment

publish articlePublication Environment

submit articleand move onhellip

[Adapted Freire 2013]

transparencydependencies

steps featuresprovenance trace

portability

robustness

preservation

accessavailable

descriptionintelligible

standardscommon APIs

licensing

standardscommon

metadata

change managementversioning

packaging

Machine actionable

Machine actionable

Reproducibility Framework

submit articleand move onhellip

Reporting

Documentation

Provenance ndashThick Trace Data

to Distilled Reporting

Distillation and

Summarisation

Alper P et al LabelFlow Exploiting Workflow Provenance to Surface Scientific Data Provenance IPAW 2014 84-96

Reproduce by ReadingArchived Record Retain the ProcessCode

The IT Crowd Series 3 Episode 4

The eLab Virtual Machine (or Docker Image)

a black box thoughdockercom

Reproduce by Running Active InstrumentRetain the bits

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

Shared Repository

Personal Notebook

Community Registry

Publishing Resource

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

standardsAdobe

UCFORE PROVODF

formats

api

Instrument

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Instrument

NISO-JATS

Instrument

J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013

Platform profiles

NISO-JATS

Instrument

Container

Manifest

OMEX archive

httpsresearchobjectgithubiospecificationsbundle

Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369

Retro-Fitted ROsusing off the shelf

platforms

Method Matters

Reproducibility Smarts

Commons not Repository

Research Tardis

Retro-fit ROs

Do As Little As Possible

Make -gt Born

Native RO platforms

RARE amp FAIR Knowledge Turns Means Research Objects

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researchers

Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

PI Team

RARE Research Reality Check

RARE Research Reality Check

Tribal Behaviour

raquo Gangs share but not with the public

raquo Tribal behaviours rsaquo Modellers share more than Experimentalists

rsaquo Experimentalists reuse models more than Modellers

raquo Trading behavioursrsaquo Collaboration ndash complementarity

correlations

raquo Structured consortia less likely to publicly share than individuals

raquo Post-hoc rationalised DataModel Cycles

[Garza 2014]

raquo Fluid transient collaborations gt ldquomy gangrdquo management

raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity

raquo Class captains (prefects)

raquo Get the cool kids on board

raquo Head teacher leadership

[Garza 2014]

Playground Rules

Trace Data

27032015 74

me

ME

my team

closecolleagues

peers

The Research Release Creep Spiral

raquo Data Hugging amp Flirting

raquo Reciprocity norms

raquo Hans W request

raquo Dowry phenomenon

raquo Private installations

raquo Private spaces on shared installations

raquo Safe havens

Too ugly to show anyone else

Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it

The code is too sophisticated for most readersrefereesI didnt work out all the details

I didnt actually write the code -- my student did

My competitors would be unfair to me

Its valuable intellectual property

It would make papers much longer

Referees would never agree to check the code

My code invokes other code with unpublished (proprietary) code

Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011

Drivers

love money

fame duty

fear timeeffort

shame duty

[Apologies to Resnick and Malone]

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Interface Framingraquo Limited scheduled sharing choices

rsaquo Never say never

raquo ldquoCitablerdquo not ldquoSharedrdquo

raquo Feedback

rsaquo Guilt tripping

rsaquo Outlier finger pointing

[Garzia]

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipResearch Currencies

ldquoResearchBitCoinrdquo

Citation Semantics

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

Page 13: I conference2015 goble-finalupload

Record and AutomateEverything

Potential Trace Heaven Folks

recomputationorg

sciencecodemanifestoorg

Honest Error Science is messy

Inherent

ReinhartRogoff Austerity economicsThomas Herndon

Nature Oct rsquo12

Zoeuml Corbyn

Fraud

ldquoI canrsquot immediately reproduce the research in my own laboratory It took an estimated 280 hours for an average user to approximately reproduce the paperrdquo

Prof Phil BourneAssociate Director NIH Big Data 2 Knowledge Program

When research goes ldquowrongrdquo

raquo Tainted resources

raquo Black boxes

raquo Poor Reporting

raquo Unavailable resources results data software

raquo Bad maths

raquo Sins of omission

raquo Poor training sloppiness

httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)

Ioannidis Why Most Published Research Findings Are False August 2005Joppa et alTroubling Trends in Scientific Software Use SCIENCE 340 May 2013

Scientific method

Social environmentraquo Impact factor mania

raquo Pressure to publish

raquo Broken peer review

raquo Research never reported

raquo Disorganisation

raquo Time pressures

raquo Prep amp curate costs

When research goes ldquowrongrdquo

httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)

Morrison

Do a Replication Study No thanks Not FAIR

Hard Resource intensiveUnrecognised TrolledJust gathering the bits together

Cross-Institutional e-Laboratory Fragmentation

Scattered parts Subject specific General resources

101 Innovations in Scholarly Communication - the Changing Research Workflow Boseman and Kramer 2015 httpfigsharecomarticles101_Innovations_in_Scholarly_Communication_the_Changing_Research_Workflow1286826

Process at ScaleMore on Models

httpsdoiorg1015490seek1investigation56

[Snoep 2015]

httpsdoiorg1015490seek1investigation56

Personal Data

Local Stores

External

Databases

Articles

Models

Standards

SOPs

Aggregated Commons Infrastructure

Consistent Comparative Reporting

Design protocols samples software modelshellip

httpwwwseek4scienceorg httpwwwfair-domorg httpisatoolsorg

Pop-Up Start UpsLittle Science within Big Science

How do Scientists Collaborate amp Cooperatively ExchangeCautiously Its all about The Trust

Extrinsic Driver

How do you get Scientists and Developers to work together Socially Its all about The Trust

Jam today Jam tomorrow Jam for all Just enough Jam Just in Time not Just in Case

Research Objects

Compound Interconnected Investigations Research Products

Multi-variousProductsPlatformsResources

Units of exchange commons contextual metadata

httpwwwresearchobjectorg

httpwwwresearchobjectorg

First class citizens - data software methods - id manage credit track profile focus

A Framework to Bundle and Link (scattered) resources related experiments Metadata Objects that carry Research Context

Research Objects

Bigger on the inside than the outside

Contentbull closed lt-gt open bull local lt-gt alienbull embed lt-gt referbull fixed lt-gt fluidbull nestedbull cite resolve steward

Contributionsbull multi ndashtyped stewarded

sited authoredbull span research researchers

platforms timebull cite resolve steward

Identity + Minimal Provenance

RO Resolution and Citation

rsaquo Defend it (snapshot)

rsaquo Locate it (most recent)

rsaquo Reuse it (a version a component)

rsaquo Credit it (contributory authorship)

rsaquo Cross link it (connections)

Biological Study Records (eg PRIDE) stableBiological Knowledge (eg UNIPROT) evolving

Goble De Roure Bechhofer Accelerating Knowledge Turns I3CK 2013

means

ends

driver

Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data Farr Research Object Commons

STELAR Asthma e-Lab Study Team for Early Life Asthma Research

Platform exchange ClinicalCodesorg coded patient cohorts exchange with NHS FARSITE system

STELAR e-Lab

Platform 1

Platform 2

Platform 3

A multi-site collaboration to support safe use of patient and research data for medical research

Research Object CurrencyCohort Studies

Focus on methods models workflows scripts software data figureshellip

Research Object Pivots and Profiles

Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation

R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482

Simply data + code

Can change the definition of

a figure and ultimately the

journal article

Colomb J and Brembs B

Sub-strains of Drosophila Canton-S differ

markedly in their locomotor behavior [v1

ref status indexed httpf1000res3is]

F1000Research 2014 3176

Other labs can replicate the study or

contribute their data to a meta-

analysis or disease model - figure

automatically updates

Data updates time-stamped

New conclusions added via versions

Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012

Software-like Release paradigm Not a static document paradigm

Reproduce looks backwards -gt Release looks forwards

raquo Science methods data change -gt agile evolution

raquo Comparisons versions forks amp merges dependencies

raquo Id amp Citations

raquo Interlinked ROs

[McEntyre]

Retrospective Release Research Object

The ROs Meme

recompute

replicatererun

repeat

re-examine

repurpose

recreate

reuse

restorereconstruct review

regeneraterevise

recycle

redo

What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo

ldquoshow A is true by doing Brdquo

verify but not falsify[Yong Nature 485 2012]

robustness tolerance

verification compliance

validation assurance

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

1 Science Changes So does the Lab

ldquoThe questions donrsquot

change but the

answers dordquoDan Reed

The lab is not fixedUpdated resources

UncertaintyBioSTIF

Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012

2 Instruments Break Labs Decaymaterials become unavailable technicians leave

Reproducibility Window

raquo Bit rot Black boxes

raquo Proprietary Licenses

raquo Clown services

raquo Partial replication

raquo Prepare to Repair

rsaquo form or function

rsaquo preserve or sustain

Jason Scott

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Methods(techniques algorithms

spec of the steps)

Materials(datasets parameters

algorithm seeds)

Experiment

Instruments(codes services scripts

underlying libraries)

Laboratory(sw and hw infrastructure

systems software

integrative platforms)

Setup

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Research Environment

submit articleand move onhellip

publish articlePublication Environment

Research Environment

publish articlePublication Environment

submit articleand move onhellip

[Adapted Freire 2013]

transparencydependencies

steps featuresprovenance trace

portability

robustness

preservation

accessavailable

descriptionintelligible

standardscommon APIs

licensing

standardscommon

metadata

change managementversioning

packaging

Machine actionable

Machine actionable

Reproducibility Framework

submit articleand move onhellip

Reporting

Documentation

Provenance ndashThick Trace Data

to Distilled Reporting

Distillation and

Summarisation

Alper P et al LabelFlow Exploiting Workflow Provenance to Surface Scientific Data Provenance IPAW 2014 84-96

Reproduce by ReadingArchived Record Retain the ProcessCode

The IT Crowd Series 3 Episode 4

The eLab Virtual Machine (or Docker Image)

a black box thoughdockercom

Reproduce by Running Active InstrumentRetain the bits

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

Shared Repository

Personal Notebook

Community Registry

Publishing Resource

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

standardsAdobe

UCFORE PROVODF

formats

api

Instrument

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Instrument

NISO-JATS

Instrument

J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013

Platform profiles

NISO-JATS

Instrument

Container

Manifest

OMEX archive

httpsresearchobjectgithubiospecificationsbundle

Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369

Retro-Fitted ROsusing off the shelf

platforms

Method Matters

Reproducibility Smarts

Commons not Repository

Research Tardis

Retro-fit ROs

Do As Little As Possible

Make -gt Born

Native RO platforms

RARE amp FAIR Knowledge Turns Means Research Objects

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researchers

Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

PI Team

RARE Research Reality Check

RARE Research Reality Check

Tribal Behaviour

raquo Gangs share but not with the public

raquo Tribal behaviours rsaquo Modellers share more than Experimentalists

rsaquo Experimentalists reuse models more than Modellers

raquo Trading behavioursrsaquo Collaboration ndash complementarity

correlations

raquo Structured consortia less likely to publicly share than individuals

raquo Post-hoc rationalised DataModel Cycles

[Garza 2014]

raquo Fluid transient collaborations gt ldquomy gangrdquo management

raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity

raquo Class captains (prefects)

raquo Get the cool kids on board

raquo Head teacher leadership

[Garza 2014]

Playground Rules

Trace Data

27032015 74

me

ME

my team

closecolleagues

peers

The Research Release Creep Spiral

raquo Data Hugging amp Flirting

raquo Reciprocity norms

raquo Hans W request

raquo Dowry phenomenon

raquo Private installations

raquo Private spaces on shared installations

raquo Safe havens

Too ugly to show anyone else

Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it

The code is too sophisticated for most readersrefereesI didnt work out all the details

I didnt actually write the code -- my student did

My competitors would be unfair to me

Its valuable intellectual property

It would make papers much longer

Referees would never agree to check the code

My code invokes other code with unpublished (proprietary) code

Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011

Drivers

love money

fame duty

fear timeeffort

shame duty

[Apologies to Resnick and Malone]

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Interface Framingraquo Limited scheduled sharing choices

rsaquo Never say never

raquo ldquoCitablerdquo not ldquoSharedrdquo

raquo Feedback

rsaquo Guilt tripping

rsaquo Outlier finger pointing

[Garzia]

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipResearch Currencies

ldquoResearchBitCoinrdquo

Citation Semantics

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

Page 14: I conference2015 goble-finalupload

Honest Error Science is messy

Inherent

ReinhartRogoff Austerity economicsThomas Herndon

Nature Oct rsquo12

Zoeuml Corbyn

Fraud

ldquoI canrsquot immediately reproduce the research in my own laboratory It took an estimated 280 hours for an average user to approximately reproduce the paperrdquo

Prof Phil BourneAssociate Director NIH Big Data 2 Knowledge Program

When research goes ldquowrongrdquo

raquo Tainted resources

raquo Black boxes

raquo Poor Reporting

raquo Unavailable resources results data software

raquo Bad maths

raquo Sins of omission

raquo Poor training sloppiness

httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)

Ioannidis Why Most Published Research Findings Are False August 2005Joppa et alTroubling Trends in Scientific Software Use SCIENCE 340 May 2013

Scientific method

Social environmentraquo Impact factor mania

raquo Pressure to publish

raquo Broken peer review

raquo Research never reported

raquo Disorganisation

raquo Time pressures

raquo Prep amp curate costs

When research goes ldquowrongrdquo

httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)

Morrison

Do a Replication Study No thanks Not FAIR

Hard Resource intensiveUnrecognised TrolledJust gathering the bits together

Cross-Institutional e-Laboratory Fragmentation

Scattered parts Subject specific General resources

101 Innovations in Scholarly Communication - the Changing Research Workflow Boseman and Kramer 2015 httpfigsharecomarticles101_Innovations_in_Scholarly_Communication_the_Changing_Research_Workflow1286826

Process at ScaleMore on Models

httpsdoiorg1015490seek1investigation56

[Snoep 2015]

httpsdoiorg1015490seek1investigation56

Personal Data

Local Stores

External

Databases

Articles

Models

Standards

SOPs

Aggregated Commons Infrastructure

Consistent Comparative Reporting

Design protocols samples software modelshellip

httpwwwseek4scienceorg httpwwwfair-domorg httpisatoolsorg

Pop-Up Start UpsLittle Science within Big Science

How do Scientists Collaborate amp Cooperatively ExchangeCautiously Its all about The Trust

Extrinsic Driver

How do you get Scientists and Developers to work together Socially Its all about The Trust

Jam today Jam tomorrow Jam for all Just enough Jam Just in Time not Just in Case

Research Objects

Compound Interconnected Investigations Research Products

Multi-variousProductsPlatformsResources

Units of exchange commons contextual metadata

httpwwwresearchobjectorg

httpwwwresearchobjectorg

First class citizens - data software methods - id manage credit track profile focus

A Framework to Bundle and Link (scattered) resources related experiments Metadata Objects that carry Research Context

Research Objects

Bigger on the inside than the outside

Contentbull closed lt-gt open bull local lt-gt alienbull embed lt-gt referbull fixed lt-gt fluidbull nestedbull cite resolve steward

Contributionsbull multi ndashtyped stewarded

sited authoredbull span research researchers

platforms timebull cite resolve steward

Identity + Minimal Provenance

RO Resolution and Citation

rsaquo Defend it (snapshot)

rsaquo Locate it (most recent)

rsaquo Reuse it (a version a component)

rsaquo Credit it (contributory authorship)

rsaquo Cross link it (connections)

Biological Study Records (eg PRIDE) stableBiological Knowledge (eg UNIPROT) evolving

Goble De Roure Bechhofer Accelerating Knowledge Turns I3CK 2013

means

ends

driver

Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data Farr Research Object Commons

STELAR Asthma e-Lab Study Team for Early Life Asthma Research

Platform exchange ClinicalCodesorg coded patient cohorts exchange with NHS FARSITE system

STELAR e-Lab

Platform 1

Platform 2

Platform 3

A multi-site collaboration to support safe use of patient and research data for medical research

Research Object CurrencyCohort Studies

Focus on methods models workflows scripts software data figureshellip

Research Object Pivots and Profiles

Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation

R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482

Simply data + code

Can change the definition of

a figure and ultimately the

journal article

Colomb J and Brembs B

Sub-strains of Drosophila Canton-S differ

markedly in their locomotor behavior [v1

ref status indexed httpf1000res3is]

F1000Research 2014 3176

Other labs can replicate the study or

contribute their data to a meta-

analysis or disease model - figure

automatically updates

Data updates time-stamped

New conclusions added via versions

Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012

Software-like Release paradigm Not a static document paradigm

Reproduce looks backwards -gt Release looks forwards

raquo Science methods data change -gt agile evolution

raquo Comparisons versions forks amp merges dependencies

raquo Id amp Citations

raquo Interlinked ROs

[McEntyre]

Retrospective Release Research Object

The ROs Meme

recompute

replicatererun

repeat

re-examine

repurpose

recreate

reuse

restorereconstruct review

regeneraterevise

recycle

redo

What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo

ldquoshow A is true by doing Brdquo

verify but not falsify[Yong Nature 485 2012]

robustness tolerance

verification compliance

validation assurance

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

1 Science Changes So does the Lab

ldquoThe questions donrsquot

change but the

answers dordquoDan Reed

The lab is not fixedUpdated resources

UncertaintyBioSTIF

Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012

2 Instruments Break Labs Decaymaterials become unavailable technicians leave

Reproducibility Window

raquo Bit rot Black boxes

raquo Proprietary Licenses

raquo Clown services

raquo Partial replication

raquo Prepare to Repair

rsaquo form or function

rsaquo preserve or sustain

Jason Scott

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Methods(techniques algorithms

spec of the steps)

Materials(datasets parameters

algorithm seeds)

Experiment

Instruments(codes services scripts

underlying libraries)

Laboratory(sw and hw infrastructure

systems software

integrative platforms)

Setup

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Research Environment

submit articleand move onhellip

publish articlePublication Environment

Research Environment

publish articlePublication Environment

submit articleand move onhellip

[Adapted Freire 2013]

transparencydependencies

steps featuresprovenance trace

portability

robustness

preservation

accessavailable

descriptionintelligible

standardscommon APIs

licensing

standardscommon

metadata

change managementversioning

packaging

Machine actionable

Machine actionable

Reproducibility Framework

submit articleand move onhellip

Reporting

Documentation

Provenance ndashThick Trace Data

to Distilled Reporting

Distillation and

Summarisation

Alper P et al LabelFlow Exploiting Workflow Provenance to Surface Scientific Data Provenance IPAW 2014 84-96

Reproduce by ReadingArchived Record Retain the ProcessCode

The IT Crowd Series 3 Episode 4

The eLab Virtual Machine (or Docker Image)

a black box thoughdockercom

Reproduce by Running Active InstrumentRetain the bits

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

Shared Repository

Personal Notebook

Community Registry

Publishing Resource

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

standardsAdobe

UCFORE PROVODF

formats

api

Instrument

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Instrument

NISO-JATS

Instrument

J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013

Platform profiles

NISO-JATS

Instrument

Container

Manifest

OMEX archive

httpsresearchobjectgithubiospecificationsbundle

Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369

Retro-Fitted ROsusing off the shelf

platforms

Method Matters

Reproducibility Smarts

Commons not Repository

Research Tardis

Retro-fit ROs

Do As Little As Possible

Make -gt Born

Native RO platforms

RARE amp FAIR Knowledge Turns Means Research Objects

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researchers

Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

PI Team

RARE Research Reality Check

RARE Research Reality Check

Tribal Behaviour

raquo Gangs share but not with the public

raquo Tribal behaviours rsaquo Modellers share more than Experimentalists

rsaquo Experimentalists reuse models more than Modellers

raquo Trading behavioursrsaquo Collaboration ndash complementarity

correlations

raquo Structured consortia less likely to publicly share than individuals

raquo Post-hoc rationalised DataModel Cycles

[Garza 2014]

raquo Fluid transient collaborations gt ldquomy gangrdquo management

raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity

raquo Class captains (prefects)

raquo Get the cool kids on board

raquo Head teacher leadership

[Garza 2014]

Playground Rules

Trace Data

27032015 74

me

ME

my team

closecolleagues

peers

The Research Release Creep Spiral

raquo Data Hugging amp Flirting

raquo Reciprocity norms

raquo Hans W request

raquo Dowry phenomenon

raquo Private installations

raquo Private spaces on shared installations

raquo Safe havens

Too ugly to show anyone else

Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it

The code is too sophisticated for most readersrefereesI didnt work out all the details

I didnt actually write the code -- my student did

My competitors would be unfair to me

Its valuable intellectual property

It would make papers much longer

Referees would never agree to check the code

My code invokes other code with unpublished (proprietary) code

Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011

Drivers

love money

fame duty

fear timeeffort

shame duty

[Apologies to Resnick and Malone]

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Interface Framingraquo Limited scheduled sharing choices

rsaquo Never say never

raquo ldquoCitablerdquo not ldquoSharedrdquo

raquo Feedback

rsaquo Guilt tripping

rsaquo Outlier finger pointing

[Garzia]

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipResearch Currencies

ldquoResearchBitCoinrdquo

Citation Semantics

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

Page 15: I conference2015 goble-finalupload

ldquoI canrsquot immediately reproduce the research in my own laboratory It took an estimated 280 hours for an average user to approximately reproduce the paperrdquo

Prof Phil BourneAssociate Director NIH Big Data 2 Knowledge Program

When research goes ldquowrongrdquo

raquo Tainted resources

raquo Black boxes

raquo Poor Reporting

raquo Unavailable resources results data software

raquo Bad maths

raquo Sins of omission

raquo Poor training sloppiness

httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)

Ioannidis Why Most Published Research Findings Are False August 2005Joppa et alTroubling Trends in Scientific Software Use SCIENCE 340 May 2013

Scientific method

Social environmentraquo Impact factor mania

raquo Pressure to publish

raquo Broken peer review

raquo Research never reported

raquo Disorganisation

raquo Time pressures

raquo Prep amp curate costs

When research goes ldquowrongrdquo

httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)

Morrison

Do a Replication Study No thanks Not FAIR

Hard Resource intensiveUnrecognised TrolledJust gathering the bits together

Cross-Institutional e-Laboratory Fragmentation

Scattered parts Subject specific General resources

101 Innovations in Scholarly Communication - the Changing Research Workflow Boseman and Kramer 2015 httpfigsharecomarticles101_Innovations_in_Scholarly_Communication_the_Changing_Research_Workflow1286826

Process at ScaleMore on Models

httpsdoiorg1015490seek1investigation56

[Snoep 2015]

httpsdoiorg1015490seek1investigation56

Personal Data

Local Stores

External

Databases

Articles

Models

Standards

SOPs

Aggregated Commons Infrastructure

Consistent Comparative Reporting

Design protocols samples software modelshellip

httpwwwseek4scienceorg httpwwwfair-domorg httpisatoolsorg

Pop-Up Start UpsLittle Science within Big Science

How do Scientists Collaborate amp Cooperatively ExchangeCautiously Its all about The Trust

Extrinsic Driver

How do you get Scientists and Developers to work together Socially Its all about The Trust

Jam today Jam tomorrow Jam for all Just enough Jam Just in Time not Just in Case

Research Objects

Compound Interconnected Investigations Research Products

Multi-variousProductsPlatformsResources

Units of exchange commons contextual metadata

httpwwwresearchobjectorg

httpwwwresearchobjectorg

First class citizens - data software methods - id manage credit track profile focus

A Framework to Bundle and Link (scattered) resources related experiments Metadata Objects that carry Research Context

Research Objects

Bigger on the inside than the outside

Contentbull closed lt-gt open bull local lt-gt alienbull embed lt-gt referbull fixed lt-gt fluidbull nestedbull cite resolve steward

Contributionsbull multi ndashtyped stewarded

sited authoredbull span research researchers

platforms timebull cite resolve steward

Identity + Minimal Provenance

RO Resolution and Citation

rsaquo Defend it (snapshot)

rsaquo Locate it (most recent)

rsaquo Reuse it (a version a component)

rsaquo Credit it (contributory authorship)

rsaquo Cross link it (connections)

Biological Study Records (eg PRIDE) stableBiological Knowledge (eg UNIPROT) evolving

Goble De Roure Bechhofer Accelerating Knowledge Turns I3CK 2013

means

ends

driver

Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data Farr Research Object Commons

STELAR Asthma e-Lab Study Team for Early Life Asthma Research

Platform exchange ClinicalCodesorg coded patient cohorts exchange with NHS FARSITE system

STELAR e-Lab

Platform 1

Platform 2

Platform 3

A multi-site collaboration to support safe use of patient and research data for medical research

Research Object CurrencyCohort Studies

Focus on methods models workflows scripts software data figureshellip

Research Object Pivots and Profiles

Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation

R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482

Simply data + code

Can change the definition of

a figure and ultimately the

journal article

Colomb J and Brembs B

Sub-strains of Drosophila Canton-S differ

markedly in their locomotor behavior [v1

ref status indexed httpf1000res3is]

F1000Research 2014 3176

Other labs can replicate the study or

contribute their data to a meta-

analysis or disease model - figure

automatically updates

Data updates time-stamped

New conclusions added via versions

Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012

Software-like Release paradigm Not a static document paradigm

Reproduce looks backwards -gt Release looks forwards

raquo Science methods data change -gt agile evolution

raquo Comparisons versions forks amp merges dependencies

raquo Id amp Citations

raquo Interlinked ROs

[McEntyre]

Retrospective Release Research Object

The ROs Meme

recompute

replicatererun

repeat

re-examine

repurpose

recreate

reuse

restorereconstruct review

regeneraterevise

recycle

redo

What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo

ldquoshow A is true by doing Brdquo

verify but not falsify[Yong Nature 485 2012]

robustness tolerance

verification compliance

validation assurance

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

1 Science Changes So does the Lab

ldquoThe questions donrsquot

change but the

answers dordquoDan Reed

The lab is not fixedUpdated resources

UncertaintyBioSTIF

Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012

2 Instruments Break Labs Decaymaterials become unavailable technicians leave

Reproducibility Window

raquo Bit rot Black boxes

raquo Proprietary Licenses

raquo Clown services

raquo Partial replication

raquo Prepare to Repair

rsaquo form or function

rsaquo preserve or sustain

Jason Scott

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Methods(techniques algorithms

spec of the steps)

Materials(datasets parameters

algorithm seeds)

Experiment

Instruments(codes services scripts

underlying libraries)

Laboratory(sw and hw infrastructure

systems software

integrative platforms)

Setup

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Research Environment

submit articleand move onhellip

publish articlePublication Environment

Research Environment

publish articlePublication Environment

submit articleand move onhellip

[Adapted Freire 2013]

transparencydependencies

steps featuresprovenance trace

portability

robustness

preservation

accessavailable

descriptionintelligible

standardscommon APIs

licensing

standardscommon

metadata

change managementversioning

packaging

Machine actionable

Machine actionable

Reproducibility Framework

submit articleand move onhellip

Reporting

Documentation

Provenance ndashThick Trace Data

to Distilled Reporting

Distillation and

Summarisation

Alper P et al LabelFlow Exploiting Workflow Provenance to Surface Scientific Data Provenance IPAW 2014 84-96

Reproduce by ReadingArchived Record Retain the ProcessCode

The IT Crowd Series 3 Episode 4

The eLab Virtual Machine (or Docker Image)

a black box thoughdockercom

Reproduce by Running Active InstrumentRetain the bits

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

Shared Repository

Personal Notebook

Community Registry

Publishing Resource

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

standardsAdobe

UCFORE PROVODF

formats

api

Instrument

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Instrument

NISO-JATS

Instrument

J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013

Platform profiles

NISO-JATS

Instrument

Container

Manifest

OMEX archive

httpsresearchobjectgithubiospecificationsbundle

Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369

Retro-Fitted ROsusing off the shelf

platforms

Method Matters

Reproducibility Smarts

Commons not Repository

Research Tardis

Retro-fit ROs

Do As Little As Possible

Make -gt Born

Native RO platforms

RARE amp FAIR Knowledge Turns Means Research Objects

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researchers

Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

PI Team

RARE Research Reality Check

RARE Research Reality Check

Tribal Behaviour

raquo Gangs share but not with the public

raquo Tribal behaviours rsaquo Modellers share more than Experimentalists

rsaquo Experimentalists reuse models more than Modellers

raquo Trading behavioursrsaquo Collaboration ndash complementarity

correlations

raquo Structured consortia less likely to publicly share than individuals

raquo Post-hoc rationalised DataModel Cycles

[Garza 2014]

raquo Fluid transient collaborations gt ldquomy gangrdquo management

raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity

raquo Class captains (prefects)

raquo Get the cool kids on board

raquo Head teacher leadership

[Garza 2014]

Playground Rules

Trace Data

27032015 74

me

ME

my team

closecolleagues

peers

The Research Release Creep Spiral

raquo Data Hugging amp Flirting

raquo Reciprocity norms

raquo Hans W request

raquo Dowry phenomenon

raquo Private installations

raquo Private spaces on shared installations

raquo Safe havens

Too ugly to show anyone else

Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it

The code is too sophisticated for most readersrefereesI didnt work out all the details

I didnt actually write the code -- my student did

My competitors would be unfair to me

Its valuable intellectual property

It would make papers much longer

Referees would never agree to check the code

My code invokes other code with unpublished (proprietary) code

Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011

Drivers

love money

fame duty

fear timeeffort

shame duty

[Apologies to Resnick and Malone]

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Interface Framingraquo Limited scheduled sharing choices

rsaquo Never say never

raquo ldquoCitablerdquo not ldquoSharedrdquo

raquo Feedback

rsaquo Guilt tripping

rsaquo Outlier finger pointing

[Garzia]

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipResearch Currencies

ldquoResearchBitCoinrdquo

Citation Semantics

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

Page 16: I conference2015 goble-finalupload

When research goes ldquowrongrdquo

raquo Tainted resources

raquo Black boxes

raquo Poor Reporting

raquo Unavailable resources results data software

raquo Bad maths

raquo Sins of omission

raquo Poor training sloppiness

httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)

Ioannidis Why Most Published Research Findings Are False August 2005Joppa et alTroubling Trends in Scientific Software Use SCIENCE 340 May 2013

Scientific method

Social environmentraquo Impact factor mania

raquo Pressure to publish

raquo Broken peer review

raquo Research never reported

raquo Disorganisation

raquo Time pressures

raquo Prep amp curate costs

When research goes ldquowrongrdquo

httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)

Morrison

Do a Replication Study No thanks Not FAIR

Hard Resource intensiveUnrecognised TrolledJust gathering the bits together

Cross-Institutional e-Laboratory Fragmentation

Scattered parts Subject specific General resources

101 Innovations in Scholarly Communication - the Changing Research Workflow Boseman and Kramer 2015 httpfigsharecomarticles101_Innovations_in_Scholarly_Communication_the_Changing_Research_Workflow1286826

Process at ScaleMore on Models

httpsdoiorg1015490seek1investigation56

[Snoep 2015]

httpsdoiorg1015490seek1investigation56

Personal Data

Local Stores

External

Databases

Articles

Models

Standards

SOPs

Aggregated Commons Infrastructure

Consistent Comparative Reporting

Design protocols samples software modelshellip

httpwwwseek4scienceorg httpwwwfair-domorg httpisatoolsorg

Pop-Up Start UpsLittle Science within Big Science

How do Scientists Collaborate amp Cooperatively ExchangeCautiously Its all about The Trust

Extrinsic Driver

How do you get Scientists and Developers to work together Socially Its all about The Trust

Jam today Jam tomorrow Jam for all Just enough Jam Just in Time not Just in Case

Research Objects

Compound Interconnected Investigations Research Products

Multi-variousProductsPlatformsResources

Units of exchange commons contextual metadata

httpwwwresearchobjectorg

httpwwwresearchobjectorg

First class citizens - data software methods - id manage credit track profile focus

A Framework to Bundle and Link (scattered) resources related experiments Metadata Objects that carry Research Context

Research Objects

Bigger on the inside than the outside

Contentbull closed lt-gt open bull local lt-gt alienbull embed lt-gt referbull fixed lt-gt fluidbull nestedbull cite resolve steward

Contributionsbull multi ndashtyped stewarded

sited authoredbull span research researchers

platforms timebull cite resolve steward

Identity + Minimal Provenance

RO Resolution and Citation

rsaquo Defend it (snapshot)

rsaquo Locate it (most recent)

rsaquo Reuse it (a version a component)

rsaquo Credit it (contributory authorship)

rsaquo Cross link it (connections)

Biological Study Records (eg PRIDE) stableBiological Knowledge (eg UNIPROT) evolving

Goble De Roure Bechhofer Accelerating Knowledge Turns I3CK 2013

means

ends

driver

Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data Farr Research Object Commons

STELAR Asthma e-Lab Study Team for Early Life Asthma Research

Platform exchange ClinicalCodesorg coded patient cohorts exchange with NHS FARSITE system

STELAR e-Lab

Platform 1

Platform 2

Platform 3

A multi-site collaboration to support safe use of patient and research data for medical research

Research Object CurrencyCohort Studies

Focus on methods models workflows scripts software data figureshellip

Research Object Pivots and Profiles

Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation

R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482

Simply data + code

Can change the definition of

a figure and ultimately the

journal article

Colomb J and Brembs B

Sub-strains of Drosophila Canton-S differ

markedly in their locomotor behavior [v1

ref status indexed httpf1000res3is]

F1000Research 2014 3176

Other labs can replicate the study or

contribute their data to a meta-

analysis or disease model - figure

automatically updates

Data updates time-stamped

New conclusions added via versions

Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012

Software-like Release paradigm Not a static document paradigm

Reproduce looks backwards -gt Release looks forwards

raquo Science methods data change -gt agile evolution

raquo Comparisons versions forks amp merges dependencies

raquo Id amp Citations

raquo Interlinked ROs

[McEntyre]

Retrospective Release Research Object

The ROs Meme

recompute

replicatererun

repeat

re-examine

repurpose

recreate

reuse

restorereconstruct review

regeneraterevise

recycle

redo

What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo

ldquoshow A is true by doing Brdquo

verify but not falsify[Yong Nature 485 2012]

robustness tolerance

verification compliance

validation assurance

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

1 Science Changes So does the Lab

ldquoThe questions donrsquot

change but the

answers dordquoDan Reed

The lab is not fixedUpdated resources

UncertaintyBioSTIF

Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012

2 Instruments Break Labs Decaymaterials become unavailable technicians leave

Reproducibility Window

raquo Bit rot Black boxes

raquo Proprietary Licenses

raquo Clown services

raquo Partial replication

raquo Prepare to Repair

rsaquo form or function

rsaquo preserve or sustain

Jason Scott

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Methods(techniques algorithms

spec of the steps)

Materials(datasets parameters

algorithm seeds)

Experiment

Instruments(codes services scripts

underlying libraries)

Laboratory(sw and hw infrastructure

systems software

integrative platforms)

Setup

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Research Environment

submit articleand move onhellip

publish articlePublication Environment

Research Environment

publish articlePublication Environment

submit articleand move onhellip

[Adapted Freire 2013]

transparencydependencies

steps featuresprovenance trace

portability

robustness

preservation

accessavailable

descriptionintelligible

standardscommon APIs

licensing

standardscommon

metadata

change managementversioning

packaging

Machine actionable

Machine actionable

Reproducibility Framework

submit articleand move onhellip

Reporting

Documentation

Provenance ndashThick Trace Data

to Distilled Reporting

Distillation and

Summarisation

Alper P et al LabelFlow Exploiting Workflow Provenance to Surface Scientific Data Provenance IPAW 2014 84-96

Reproduce by ReadingArchived Record Retain the ProcessCode

The IT Crowd Series 3 Episode 4

The eLab Virtual Machine (or Docker Image)

a black box thoughdockercom

Reproduce by Running Active InstrumentRetain the bits

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

Shared Repository

Personal Notebook

Community Registry

Publishing Resource

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

standardsAdobe

UCFORE PROVODF

formats

api

Instrument

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Instrument

NISO-JATS

Instrument

J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013

Platform profiles

NISO-JATS

Instrument

Container

Manifest

OMEX archive

httpsresearchobjectgithubiospecificationsbundle

Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369

Retro-Fitted ROsusing off the shelf

platforms

Method Matters

Reproducibility Smarts

Commons not Repository

Research Tardis

Retro-fit ROs

Do As Little As Possible

Make -gt Born

Native RO platforms

RARE amp FAIR Knowledge Turns Means Research Objects

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researchers

Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

PI Team

RARE Research Reality Check

RARE Research Reality Check

Tribal Behaviour

raquo Gangs share but not with the public

raquo Tribal behaviours rsaquo Modellers share more than Experimentalists

rsaquo Experimentalists reuse models more than Modellers

raquo Trading behavioursrsaquo Collaboration ndash complementarity

correlations

raquo Structured consortia less likely to publicly share than individuals

raquo Post-hoc rationalised DataModel Cycles

[Garza 2014]

raquo Fluid transient collaborations gt ldquomy gangrdquo management

raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity

raquo Class captains (prefects)

raquo Get the cool kids on board

raquo Head teacher leadership

[Garza 2014]

Playground Rules

Trace Data

27032015 74

me

ME

my team

closecolleagues

peers

The Research Release Creep Spiral

raquo Data Hugging amp Flirting

raquo Reciprocity norms

raquo Hans W request

raquo Dowry phenomenon

raquo Private installations

raquo Private spaces on shared installations

raquo Safe havens

Too ugly to show anyone else

Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it

The code is too sophisticated for most readersrefereesI didnt work out all the details

I didnt actually write the code -- my student did

My competitors would be unfair to me

Its valuable intellectual property

It would make papers much longer

Referees would never agree to check the code

My code invokes other code with unpublished (proprietary) code

Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011

Drivers

love money

fame duty

fear timeeffort

shame duty

[Apologies to Resnick and Malone]

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Interface Framingraquo Limited scheduled sharing choices

rsaquo Never say never

raquo ldquoCitablerdquo not ldquoSharedrdquo

raquo Feedback

rsaquo Guilt tripping

rsaquo Outlier finger pointing

[Garzia]

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipResearch Currencies

ldquoResearchBitCoinrdquo

Citation Semantics

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

Page 17: I conference2015 goble-finalupload

Social environmentraquo Impact factor mania

raquo Pressure to publish

raquo Broken peer review

raquo Research never reported

raquo Disorganisation

raquo Time pressures

raquo Prep amp curate costs

When research goes ldquowrongrdquo

httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)

Morrison

Do a Replication Study No thanks Not FAIR

Hard Resource intensiveUnrecognised TrolledJust gathering the bits together

Cross-Institutional e-Laboratory Fragmentation

Scattered parts Subject specific General resources

101 Innovations in Scholarly Communication - the Changing Research Workflow Boseman and Kramer 2015 httpfigsharecomarticles101_Innovations_in_Scholarly_Communication_the_Changing_Research_Workflow1286826

Process at ScaleMore on Models

httpsdoiorg1015490seek1investigation56

[Snoep 2015]

httpsdoiorg1015490seek1investigation56

Personal Data

Local Stores

External

Databases

Articles

Models

Standards

SOPs

Aggregated Commons Infrastructure

Consistent Comparative Reporting

Design protocols samples software modelshellip

httpwwwseek4scienceorg httpwwwfair-domorg httpisatoolsorg

Pop-Up Start UpsLittle Science within Big Science

How do Scientists Collaborate amp Cooperatively ExchangeCautiously Its all about The Trust

Extrinsic Driver

How do you get Scientists and Developers to work together Socially Its all about The Trust

Jam today Jam tomorrow Jam for all Just enough Jam Just in Time not Just in Case

Research Objects

Compound Interconnected Investigations Research Products

Multi-variousProductsPlatformsResources

Units of exchange commons contextual metadata

httpwwwresearchobjectorg

httpwwwresearchobjectorg

First class citizens - data software methods - id manage credit track profile focus

A Framework to Bundle and Link (scattered) resources related experiments Metadata Objects that carry Research Context

Research Objects

Bigger on the inside than the outside

Contentbull closed lt-gt open bull local lt-gt alienbull embed lt-gt referbull fixed lt-gt fluidbull nestedbull cite resolve steward

Contributionsbull multi ndashtyped stewarded

sited authoredbull span research researchers

platforms timebull cite resolve steward

Identity + Minimal Provenance

RO Resolution and Citation

rsaquo Defend it (snapshot)

rsaquo Locate it (most recent)

rsaquo Reuse it (a version a component)

rsaquo Credit it (contributory authorship)

rsaquo Cross link it (connections)

Biological Study Records (eg PRIDE) stableBiological Knowledge (eg UNIPROT) evolving

Goble De Roure Bechhofer Accelerating Knowledge Turns I3CK 2013

means

ends

driver

Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data Farr Research Object Commons

STELAR Asthma e-Lab Study Team for Early Life Asthma Research

Platform exchange ClinicalCodesorg coded patient cohorts exchange with NHS FARSITE system

STELAR e-Lab

Platform 1

Platform 2

Platform 3

A multi-site collaboration to support safe use of patient and research data for medical research

Research Object CurrencyCohort Studies

Focus on methods models workflows scripts software data figureshellip

Research Object Pivots and Profiles

Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation

R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482

Simply data + code

Can change the definition of

a figure and ultimately the

journal article

Colomb J and Brembs B

Sub-strains of Drosophila Canton-S differ

markedly in their locomotor behavior [v1

ref status indexed httpf1000res3is]

F1000Research 2014 3176

Other labs can replicate the study or

contribute their data to a meta-

analysis or disease model - figure

automatically updates

Data updates time-stamped

New conclusions added via versions

Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012

Software-like Release paradigm Not a static document paradigm

Reproduce looks backwards -gt Release looks forwards

raquo Science methods data change -gt agile evolution

raquo Comparisons versions forks amp merges dependencies

raquo Id amp Citations

raquo Interlinked ROs

[McEntyre]

Retrospective Release Research Object

The ROs Meme

recompute

replicatererun

repeat

re-examine

repurpose

recreate

reuse

restorereconstruct review

regeneraterevise

recycle

redo

What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo

ldquoshow A is true by doing Brdquo

verify but not falsify[Yong Nature 485 2012]

robustness tolerance

verification compliance

validation assurance

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

1 Science Changes So does the Lab

ldquoThe questions donrsquot

change but the

answers dordquoDan Reed

The lab is not fixedUpdated resources

UncertaintyBioSTIF

Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012

2 Instruments Break Labs Decaymaterials become unavailable technicians leave

Reproducibility Window

raquo Bit rot Black boxes

raquo Proprietary Licenses

raquo Clown services

raquo Partial replication

raquo Prepare to Repair

rsaquo form or function

rsaquo preserve or sustain

Jason Scott

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Methods(techniques algorithms

spec of the steps)

Materials(datasets parameters

algorithm seeds)

Experiment

Instruments(codes services scripts

underlying libraries)

Laboratory(sw and hw infrastructure

systems software

integrative platforms)

Setup

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Research Environment

submit articleand move onhellip

publish articlePublication Environment

Research Environment

publish articlePublication Environment

submit articleand move onhellip

[Adapted Freire 2013]

transparencydependencies

steps featuresprovenance trace

portability

robustness

preservation

accessavailable

descriptionintelligible

standardscommon APIs

licensing

standardscommon

metadata

change managementversioning

packaging

Machine actionable

Machine actionable

Reproducibility Framework

submit articleand move onhellip

Reporting

Documentation

Provenance ndashThick Trace Data

to Distilled Reporting

Distillation and

Summarisation

Alper P et al LabelFlow Exploiting Workflow Provenance to Surface Scientific Data Provenance IPAW 2014 84-96

Reproduce by ReadingArchived Record Retain the ProcessCode

The IT Crowd Series 3 Episode 4

The eLab Virtual Machine (or Docker Image)

a black box thoughdockercom

Reproduce by Running Active InstrumentRetain the bits

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

Shared Repository

Personal Notebook

Community Registry

Publishing Resource

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

standardsAdobe

UCFORE PROVODF

formats

api

Instrument

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Instrument

NISO-JATS

Instrument

J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013

Platform profiles

NISO-JATS

Instrument

Container

Manifest

OMEX archive

httpsresearchobjectgithubiospecificationsbundle

Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369

Retro-Fitted ROsusing off the shelf

platforms

Method Matters

Reproducibility Smarts

Commons not Repository

Research Tardis

Retro-fit ROs

Do As Little As Possible

Make -gt Born

Native RO platforms

RARE amp FAIR Knowledge Turns Means Research Objects

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researchers

Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

PI Team

RARE Research Reality Check

RARE Research Reality Check

Tribal Behaviour

raquo Gangs share but not with the public

raquo Tribal behaviours rsaquo Modellers share more than Experimentalists

rsaquo Experimentalists reuse models more than Modellers

raquo Trading behavioursrsaquo Collaboration ndash complementarity

correlations

raquo Structured consortia less likely to publicly share than individuals

raquo Post-hoc rationalised DataModel Cycles

[Garza 2014]

raquo Fluid transient collaborations gt ldquomy gangrdquo management

raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity

raquo Class captains (prefects)

raquo Get the cool kids on board

raquo Head teacher leadership

[Garza 2014]

Playground Rules

Trace Data

27032015 74

me

ME

my team

closecolleagues

peers

The Research Release Creep Spiral

raquo Data Hugging amp Flirting

raquo Reciprocity norms

raquo Hans W request

raquo Dowry phenomenon

raquo Private installations

raquo Private spaces on shared installations

raquo Safe havens

Too ugly to show anyone else

Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it

The code is too sophisticated for most readersrefereesI didnt work out all the details

I didnt actually write the code -- my student did

My competitors would be unfair to me

Its valuable intellectual property

It would make papers much longer

Referees would never agree to check the code

My code invokes other code with unpublished (proprietary) code

Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011

Drivers

love money

fame duty

fear timeeffort

shame duty

[Apologies to Resnick and Malone]

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Interface Framingraquo Limited scheduled sharing choices

rsaquo Never say never

raquo ldquoCitablerdquo not ldquoSharedrdquo

raquo Feedback

rsaquo Guilt tripping

rsaquo Outlier finger pointing

[Garzia]

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipResearch Currencies

ldquoResearchBitCoinrdquo

Citation Semantics

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

Page 18: I conference2015 goble-finalupload

Cross-Institutional e-Laboratory Fragmentation

Scattered parts Subject specific General resources

101 Innovations in Scholarly Communication - the Changing Research Workflow Boseman and Kramer 2015 httpfigsharecomarticles101_Innovations_in_Scholarly_Communication_the_Changing_Research_Workflow1286826

Process at ScaleMore on Models

httpsdoiorg1015490seek1investigation56

[Snoep 2015]

httpsdoiorg1015490seek1investigation56

Personal Data

Local Stores

External

Databases

Articles

Models

Standards

SOPs

Aggregated Commons Infrastructure

Consistent Comparative Reporting

Design protocols samples software modelshellip

httpwwwseek4scienceorg httpwwwfair-domorg httpisatoolsorg

Pop-Up Start UpsLittle Science within Big Science

How do Scientists Collaborate amp Cooperatively ExchangeCautiously Its all about The Trust

Extrinsic Driver

How do you get Scientists and Developers to work together Socially Its all about The Trust

Jam today Jam tomorrow Jam for all Just enough Jam Just in Time not Just in Case

Research Objects

Compound Interconnected Investigations Research Products

Multi-variousProductsPlatformsResources

Units of exchange commons contextual metadata

httpwwwresearchobjectorg

httpwwwresearchobjectorg

First class citizens - data software methods - id manage credit track profile focus

A Framework to Bundle and Link (scattered) resources related experiments Metadata Objects that carry Research Context

Research Objects

Bigger on the inside than the outside

Contentbull closed lt-gt open bull local lt-gt alienbull embed lt-gt referbull fixed lt-gt fluidbull nestedbull cite resolve steward

Contributionsbull multi ndashtyped stewarded

sited authoredbull span research researchers

platforms timebull cite resolve steward

Identity + Minimal Provenance

RO Resolution and Citation

rsaquo Defend it (snapshot)

rsaquo Locate it (most recent)

rsaquo Reuse it (a version a component)

rsaquo Credit it (contributory authorship)

rsaquo Cross link it (connections)

Biological Study Records (eg PRIDE) stableBiological Knowledge (eg UNIPROT) evolving

Goble De Roure Bechhofer Accelerating Knowledge Turns I3CK 2013

means

ends

driver

Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data Farr Research Object Commons

STELAR Asthma e-Lab Study Team for Early Life Asthma Research

Platform exchange ClinicalCodesorg coded patient cohorts exchange with NHS FARSITE system

STELAR e-Lab

Platform 1

Platform 2

Platform 3

A multi-site collaboration to support safe use of patient and research data for medical research

Research Object CurrencyCohort Studies

Focus on methods models workflows scripts software data figureshellip

Research Object Pivots and Profiles

Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation

R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482

Simply data + code

Can change the definition of

a figure and ultimately the

journal article

Colomb J and Brembs B

Sub-strains of Drosophila Canton-S differ

markedly in their locomotor behavior [v1

ref status indexed httpf1000res3is]

F1000Research 2014 3176

Other labs can replicate the study or

contribute their data to a meta-

analysis or disease model - figure

automatically updates

Data updates time-stamped

New conclusions added via versions

Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012

Software-like Release paradigm Not a static document paradigm

Reproduce looks backwards -gt Release looks forwards

raquo Science methods data change -gt agile evolution

raquo Comparisons versions forks amp merges dependencies

raquo Id amp Citations

raquo Interlinked ROs

[McEntyre]

Retrospective Release Research Object

The ROs Meme

recompute

replicatererun

repeat

re-examine

repurpose

recreate

reuse

restorereconstruct review

regeneraterevise

recycle

redo

What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo

ldquoshow A is true by doing Brdquo

verify but not falsify[Yong Nature 485 2012]

robustness tolerance

verification compliance

validation assurance

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

1 Science Changes So does the Lab

ldquoThe questions donrsquot

change but the

answers dordquoDan Reed

The lab is not fixedUpdated resources

UncertaintyBioSTIF

Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012

2 Instruments Break Labs Decaymaterials become unavailable technicians leave

Reproducibility Window

raquo Bit rot Black boxes

raquo Proprietary Licenses

raquo Clown services

raquo Partial replication

raquo Prepare to Repair

rsaquo form or function

rsaquo preserve or sustain

Jason Scott

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Methods(techniques algorithms

spec of the steps)

Materials(datasets parameters

algorithm seeds)

Experiment

Instruments(codes services scripts

underlying libraries)

Laboratory(sw and hw infrastructure

systems software

integrative platforms)

Setup

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Research Environment

submit articleand move onhellip

publish articlePublication Environment

Research Environment

publish articlePublication Environment

submit articleand move onhellip

[Adapted Freire 2013]

transparencydependencies

steps featuresprovenance trace

portability

robustness

preservation

accessavailable

descriptionintelligible

standardscommon APIs

licensing

standardscommon

metadata

change managementversioning

packaging

Machine actionable

Machine actionable

Reproducibility Framework

submit articleand move onhellip

Reporting

Documentation

Provenance ndashThick Trace Data

to Distilled Reporting

Distillation and

Summarisation

Alper P et al LabelFlow Exploiting Workflow Provenance to Surface Scientific Data Provenance IPAW 2014 84-96

Reproduce by ReadingArchived Record Retain the ProcessCode

The IT Crowd Series 3 Episode 4

The eLab Virtual Machine (or Docker Image)

a black box thoughdockercom

Reproduce by Running Active InstrumentRetain the bits

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

Shared Repository

Personal Notebook

Community Registry

Publishing Resource

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

standardsAdobe

UCFORE PROVODF

formats

api

Instrument

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Instrument

NISO-JATS

Instrument

J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013

Platform profiles

NISO-JATS

Instrument

Container

Manifest

OMEX archive

httpsresearchobjectgithubiospecificationsbundle

Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369

Retro-Fitted ROsusing off the shelf

platforms

Method Matters

Reproducibility Smarts

Commons not Repository

Research Tardis

Retro-fit ROs

Do As Little As Possible

Make -gt Born

Native RO platforms

RARE amp FAIR Knowledge Turns Means Research Objects

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researchers

Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

PI Team

RARE Research Reality Check

RARE Research Reality Check

Tribal Behaviour

raquo Gangs share but not with the public

raquo Tribal behaviours rsaquo Modellers share more than Experimentalists

rsaquo Experimentalists reuse models more than Modellers

raquo Trading behavioursrsaquo Collaboration ndash complementarity

correlations

raquo Structured consortia less likely to publicly share than individuals

raquo Post-hoc rationalised DataModel Cycles

[Garza 2014]

raquo Fluid transient collaborations gt ldquomy gangrdquo management

raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity

raquo Class captains (prefects)

raquo Get the cool kids on board

raquo Head teacher leadership

[Garza 2014]

Playground Rules

Trace Data

27032015 74

me

ME

my team

closecolleagues

peers

The Research Release Creep Spiral

raquo Data Hugging amp Flirting

raquo Reciprocity norms

raquo Hans W request

raquo Dowry phenomenon

raquo Private installations

raquo Private spaces on shared installations

raquo Safe havens

Too ugly to show anyone else

Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it

The code is too sophisticated for most readersrefereesI didnt work out all the details

I didnt actually write the code -- my student did

My competitors would be unfair to me

Its valuable intellectual property

It would make papers much longer

Referees would never agree to check the code

My code invokes other code with unpublished (proprietary) code

Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011

Drivers

love money

fame duty

fear timeeffort

shame duty

[Apologies to Resnick and Malone]

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Interface Framingraquo Limited scheduled sharing choices

rsaquo Never say never

raquo ldquoCitablerdquo not ldquoSharedrdquo

raquo Feedback

rsaquo Guilt tripping

rsaquo Outlier finger pointing

[Garzia]

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipResearch Currencies

ldquoResearchBitCoinrdquo

Citation Semantics

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

Page 19: I conference2015 goble-finalupload

Process at ScaleMore on Models

httpsdoiorg1015490seek1investigation56

[Snoep 2015]

httpsdoiorg1015490seek1investigation56

Personal Data

Local Stores

External

Databases

Articles

Models

Standards

SOPs

Aggregated Commons Infrastructure

Consistent Comparative Reporting

Design protocols samples software modelshellip

httpwwwseek4scienceorg httpwwwfair-domorg httpisatoolsorg

Pop-Up Start UpsLittle Science within Big Science

How do Scientists Collaborate amp Cooperatively ExchangeCautiously Its all about The Trust

Extrinsic Driver

How do you get Scientists and Developers to work together Socially Its all about The Trust

Jam today Jam tomorrow Jam for all Just enough Jam Just in Time not Just in Case

Research Objects

Compound Interconnected Investigations Research Products

Multi-variousProductsPlatformsResources

Units of exchange commons contextual metadata

httpwwwresearchobjectorg

httpwwwresearchobjectorg

First class citizens - data software methods - id manage credit track profile focus

A Framework to Bundle and Link (scattered) resources related experiments Metadata Objects that carry Research Context

Research Objects

Bigger on the inside than the outside

Contentbull closed lt-gt open bull local lt-gt alienbull embed lt-gt referbull fixed lt-gt fluidbull nestedbull cite resolve steward

Contributionsbull multi ndashtyped stewarded

sited authoredbull span research researchers

platforms timebull cite resolve steward

Identity + Minimal Provenance

RO Resolution and Citation

rsaquo Defend it (snapshot)

rsaquo Locate it (most recent)

rsaquo Reuse it (a version a component)

rsaquo Credit it (contributory authorship)

rsaquo Cross link it (connections)

Biological Study Records (eg PRIDE) stableBiological Knowledge (eg UNIPROT) evolving

Goble De Roure Bechhofer Accelerating Knowledge Turns I3CK 2013

means

ends

driver

Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data Farr Research Object Commons

STELAR Asthma e-Lab Study Team for Early Life Asthma Research

Platform exchange ClinicalCodesorg coded patient cohorts exchange with NHS FARSITE system

STELAR e-Lab

Platform 1

Platform 2

Platform 3

A multi-site collaboration to support safe use of patient and research data for medical research

Research Object CurrencyCohort Studies

Focus on methods models workflows scripts software data figureshellip

Research Object Pivots and Profiles

Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation

R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482

Simply data + code

Can change the definition of

a figure and ultimately the

journal article

Colomb J and Brembs B

Sub-strains of Drosophila Canton-S differ

markedly in their locomotor behavior [v1

ref status indexed httpf1000res3is]

F1000Research 2014 3176

Other labs can replicate the study or

contribute their data to a meta-

analysis or disease model - figure

automatically updates

Data updates time-stamped

New conclusions added via versions

Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012

Software-like Release paradigm Not a static document paradigm

Reproduce looks backwards -gt Release looks forwards

raquo Science methods data change -gt agile evolution

raquo Comparisons versions forks amp merges dependencies

raquo Id amp Citations

raquo Interlinked ROs

[McEntyre]

Retrospective Release Research Object

The ROs Meme

recompute

replicatererun

repeat

re-examine

repurpose

recreate

reuse

restorereconstruct review

regeneraterevise

recycle

redo

What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo

ldquoshow A is true by doing Brdquo

verify but not falsify[Yong Nature 485 2012]

robustness tolerance

verification compliance

validation assurance

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

1 Science Changes So does the Lab

ldquoThe questions donrsquot

change but the

answers dordquoDan Reed

The lab is not fixedUpdated resources

UncertaintyBioSTIF

Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012

2 Instruments Break Labs Decaymaterials become unavailable technicians leave

Reproducibility Window

raquo Bit rot Black boxes

raquo Proprietary Licenses

raquo Clown services

raquo Partial replication

raquo Prepare to Repair

rsaquo form or function

rsaquo preserve or sustain

Jason Scott

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Methods(techniques algorithms

spec of the steps)

Materials(datasets parameters

algorithm seeds)

Experiment

Instruments(codes services scripts

underlying libraries)

Laboratory(sw and hw infrastructure

systems software

integrative platforms)

Setup

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Research Environment

submit articleand move onhellip

publish articlePublication Environment

Research Environment

publish articlePublication Environment

submit articleand move onhellip

[Adapted Freire 2013]

transparencydependencies

steps featuresprovenance trace

portability

robustness

preservation

accessavailable

descriptionintelligible

standardscommon APIs

licensing

standardscommon

metadata

change managementversioning

packaging

Machine actionable

Machine actionable

Reproducibility Framework

submit articleand move onhellip

Reporting

Documentation

Provenance ndashThick Trace Data

to Distilled Reporting

Distillation and

Summarisation

Alper P et al LabelFlow Exploiting Workflow Provenance to Surface Scientific Data Provenance IPAW 2014 84-96

Reproduce by ReadingArchived Record Retain the ProcessCode

The IT Crowd Series 3 Episode 4

The eLab Virtual Machine (or Docker Image)

a black box thoughdockercom

Reproduce by Running Active InstrumentRetain the bits

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

Shared Repository

Personal Notebook

Community Registry

Publishing Resource

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

standardsAdobe

UCFORE PROVODF

formats

api

Instrument

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Instrument

NISO-JATS

Instrument

J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013

Platform profiles

NISO-JATS

Instrument

Container

Manifest

OMEX archive

httpsresearchobjectgithubiospecificationsbundle

Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369

Retro-Fitted ROsusing off the shelf

platforms

Method Matters

Reproducibility Smarts

Commons not Repository

Research Tardis

Retro-fit ROs

Do As Little As Possible

Make -gt Born

Native RO platforms

RARE amp FAIR Knowledge Turns Means Research Objects

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researchers

Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

PI Team

RARE Research Reality Check

RARE Research Reality Check

Tribal Behaviour

raquo Gangs share but not with the public

raquo Tribal behaviours rsaquo Modellers share more than Experimentalists

rsaquo Experimentalists reuse models more than Modellers

raquo Trading behavioursrsaquo Collaboration ndash complementarity

correlations

raquo Structured consortia less likely to publicly share than individuals

raquo Post-hoc rationalised DataModel Cycles

[Garza 2014]

raquo Fluid transient collaborations gt ldquomy gangrdquo management

raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity

raquo Class captains (prefects)

raquo Get the cool kids on board

raquo Head teacher leadership

[Garza 2014]

Playground Rules

Trace Data

27032015 74

me

ME

my team

closecolleagues

peers

The Research Release Creep Spiral

raquo Data Hugging amp Flirting

raquo Reciprocity norms

raquo Hans W request

raquo Dowry phenomenon

raquo Private installations

raquo Private spaces on shared installations

raquo Safe havens

Too ugly to show anyone else

Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it

The code is too sophisticated for most readersrefereesI didnt work out all the details

I didnt actually write the code -- my student did

My competitors would be unfair to me

Its valuable intellectual property

It would make papers much longer

Referees would never agree to check the code

My code invokes other code with unpublished (proprietary) code

Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011

Drivers

love money

fame duty

fear timeeffort

shame duty

[Apologies to Resnick and Malone]

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Interface Framingraquo Limited scheduled sharing choices

rsaquo Never say never

raquo ldquoCitablerdquo not ldquoSharedrdquo

raquo Feedback

rsaquo Guilt tripping

rsaquo Outlier finger pointing

[Garzia]

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipResearch Currencies

ldquoResearchBitCoinrdquo

Citation Semantics

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

Page 20: I conference2015 goble-finalupload

httpsdoiorg1015490seek1investigation56

[Snoep 2015]

httpsdoiorg1015490seek1investigation56

Personal Data

Local Stores

External

Databases

Articles

Models

Standards

SOPs

Aggregated Commons Infrastructure

Consistent Comparative Reporting

Design protocols samples software modelshellip

httpwwwseek4scienceorg httpwwwfair-domorg httpisatoolsorg

Pop-Up Start UpsLittle Science within Big Science

How do Scientists Collaborate amp Cooperatively ExchangeCautiously Its all about The Trust

Extrinsic Driver

How do you get Scientists and Developers to work together Socially Its all about The Trust

Jam today Jam tomorrow Jam for all Just enough Jam Just in Time not Just in Case

Research Objects

Compound Interconnected Investigations Research Products

Multi-variousProductsPlatformsResources

Units of exchange commons contextual metadata

httpwwwresearchobjectorg

httpwwwresearchobjectorg

First class citizens - data software methods - id manage credit track profile focus

A Framework to Bundle and Link (scattered) resources related experiments Metadata Objects that carry Research Context

Research Objects

Bigger on the inside than the outside

Contentbull closed lt-gt open bull local lt-gt alienbull embed lt-gt referbull fixed lt-gt fluidbull nestedbull cite resolve steward

Contributionsbull multi ndashtyped stewarded

sited authoredbull span research researchers

platforms timebull cite resolve steward

Identity + Minimal Provenance

RO Resolution and Citation

rsaquo Defend it (snapshot)

rsaquo Locate it (most recent)

rsaquo Reuse it (a version a component)

rsaquo Credit it (contributory authorship)

rsaquo Cross link it (connections)

Biological Study Records (eg PRIDE) stableBiological Knowledge (eg UNIPROT) evolving

Goble De Roure Bechhofer Accelerating Knowledge Turns I3CK 2013

means

ends

driver

Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data Farr Research Object Commons

STELAR Asthma e-Lab Study Team for Early Life Asthma Research

Platform exchange ClinicalCodesorg coded patient cohorts exchange with NHS FARSITE system

STELAR e-Lab

Platform 1

Platform 2

Platform 3

A multi-site collaboration to support safe use of patient and research data for medical research

Research Object CurrencyCohort Studies

Focus on methods models workflows scripts software data figureshellip

Research Object Pivots and Profiles

Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation

R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482

Simply data + code

Can change the definition of

a figure and ultimately the

journal article

Colomb J and Brembs B

Sub-strains of Drosophila Canton-S differ

markedly in their locomotor behavior [v1

ref status indexed httpf1000res3is]

F1000Research 2014 3176

Other labs can replicate the study or

contribute their data to a meta-

analysis or disease model - figure

automatically updates

Data updates time-stamped

New conclusions added via versions

Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012

Software-like Release paradigm Not a static document paradigm

Reproduce looks backwards -gt Release looks forwards

raquo Science methods data change -gt agile evolution

raquo Comparisons versions forks amp merges dependencies

raquo Id amp Citations

raquo Interlinked ROs

[McEntyre]

Retrospective Release Research Object

The ROs Meme

recompute

replicatererun

repeat

re-examine

repurpose

recreate

reuse

restorereconstruct review

regeneraterevise

recycle

redo

What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo

ldquoshow A is true by doing Brdquo

verify but not falsify[Yong Nature 485 2012]

robustness tolerance

verification compliance

validation assurance

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

1 Science Changes So does the Lab

ldquoThe questions donrsquot

change but the

answers dordquoDan Reed

The lab is not fixedUpdated resources

UncertaintyBioSTIF

Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012

2 Instruments Break Labs Decaymaterials become unavailable technicians leave

Reproducibility Window

raquo Bit rot Black boxes

raquo Proprietary Licenses

raquo Clown services

raquo Partial replication

raquo Prepare to Repair

rsaquo form or function

rsaquo preserve or sustain

Jason Scott

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Methods(techniques algorithms

spec of the steps)

Materials(datasets parameters

algorithm seeds)

Experiment

Instruments(codes services scripts

underlying libraries)

Laboratory(sw and hw infrastructure

systems software

integrative platforms)

Setup

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Research Environment

submit articleand move onhellip

publish articlePublication Environment

Research Environment

publish articlePublication Environment

submit articleand move onhellip

[Adapted Freire 2013]

transparencydependencies

steps featuresprovenance trace

portability

robustness

preservation

accessavailable

descriptionintelligible

standardscommon APIs

licensing

standardscommon

metadata

change managementversioning

packaging

Machine actionable

Machine actionable

Reproducibility Framework

submit articleand move onhellip

Reporting

Documentation

Provenance ndashThick Trace Data

to Distilled Reporting

Distillation and

Summarisation

Alper P et al LabelFlow Exploiting Workflow Provenance to Surface Scientific Data Provenance IPAW 2014 84-96

Reproduce by ReadingArchived Record Retain the ProcessCode

The IT Crowd Series 3 Episode 4

The eLab Virtual Machine (or Docker Image)

a black box thoughdockercom

Reproduce by Running Active InstrumentRetain the bits

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

Shared Repository

Personal Notebook

Community Registry

Publishing Resource

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

standardsAdobe

UCFORE PROVODF

formats

api

Instrument

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Instrument

NISO-JATS

Instrument

J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013

Platform profiles

NISO-JATS

Instrument

Container

Manifest

OMEX archive

httpsresearchobjectgithubiospecificationsbundle

Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369

Retro-Fitted ROsusing off the shelf

platforms

Method Matters

Reproducibility Smarts

Commons not Repository

Research Tardis

Retro-fit ROs

Do As Little As Possible

Make -gt Born

Native RO platforms

RARE amp FAIR Knowledge Turns Means Research Objects

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researchers

Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

PI Team

RARE Research Reality Check

RARE Research Reality Check

Tribal Behaviour

raquo Gangs share but not with the public

raquo Tribal behaviours rsaquo Modellers share more than Experimentalists

rsaquo Experimentalists reuse models more than Modellers

raquo Trading behavioursrsaquo Collaboration ndash complementarity

correlations

raquo Structured consortia less likely to publicly share than individuals

raquo Post-hoc rationalised DataModel Cycles

[Garza 2014]

raquo Fluid transient collaborations gt ldquomy gangrdquo management

raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity

raquo Class captains (prefects)

raquo Get the cool kids on board

raquo Head teacher leadership

[Garza 2014]

Playground Rules

Trace Data

27032015 74

me

ME

my team

closecolleagues

peers

The Research Release Creep Spiral

raquo Data Hugging amp Flirting

raquo Reciprocity norms

raquo Hans W request

raquo Dowry phenomenon

raquo Private installations

raquo Private spaces on shared installations

raquo Safe havens

Too ugly to show anyone else

Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it

The code is too sophisticated for most readersrefereesI didnt work out all the details

I didnt actually write the code -- my student did

My competitors would be unfair to me

Its valuable intellectual property

It would make papers much longer

Referees would never agree to check the code

My code invokes other code with unpublished (proprietary) code

Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011

Drivers

love money

fame duty

fear timeeffort

shame duty

[Apologies to Resnick and Malone]

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Interface Framingraquo Limited scheduled sharing choices

rsaquo Never say never

raquo ldquoCitablerdquo not ldquoSharedrdquo

raquo Feedback

rsaquo Guilt tripping

rsaquo Outlier finger pointing

[Garzia]

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipResearch Currencies

ldquoResearchBitCoinrdquo

Citation Semantics

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

Page 21: I conference2015 goble-finalupload

httpsdoiorg1015490seek1investigation56

Personal Data

Local Stores

External

Databases

Articles

Models

Standards

SOPs

Aggregated Commons Infrastructure

Consistent Comparative Reporting

Design protocols samples software modelshellip

httpwwwseek4scienceorg httpwwwfair-domorg httpisatoolsorg

Pop-Up Start UpsLittle Science within Big Science

How do Scientists Collaborate amp Cooperatively ExchangeCautiously Its all about The Trust

Extrinsic Driver

How do you get Scientists and Developers to work together Socially Its all about The Trust

Jam today Jam tomorrow Jam for all Just enough Jam Just in Time not Just in Case

Research Objects

Compound Interconnected Investigations Research Products

Multi-variousProductsPlatformsResources

Units of exchange commons contextual metadata

httpwwwresearchobjectorg

httpwwwresearchobjectorg

First class citizens - data software methods - id manage credit track profile focus

A Framework to Bundle and Link (scattered) resources related experiments Metadata Objects that carry Research Context

Research Objects

Bigger on the inside than the outside

Contentbull closed lt-gt open bull local lt-gt alienbull embed lt-gt referbull fixed lt-gt fluidbull nestedbull cite resolve steward

Contributionsbull multi ndashtyped stewarded

sited authoredbull span research researchers

platforms timebull cite resolve steward

Identity + Minimal Provenance

RO Resolution and Citation

rsaquo Defend it (snapshot)

rsaquo Locate it (most recent)

rsaquo Reuse it (a version a component)

rsaquo Credit it (contributory authorship)

rsaquo Cross link it (connections)

Biological Study Records (eg PRIDE) stableBiological Knowledge (eg UNIPROT) evolving

Goble De Roure Bechhofer Accelerating Knowledge Turns I3CK 2013

means

ends

driver

Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data Farr Research Object Commons

STELAR Asthma e-Lab Study Team for Early Life Asthma Research

Platform exchange ClinicalCodesorg coded patient cohorts exchange with NHS FARSITE system

STELAR e-Lab

Platform 1

Platform 2

Platform 3

A multi-site collaboration to support safe use of patient and research data for medical research

Research Object CurrencyCohort Studies

Focus on methods models workflows scripts software data figureshellip

Research Object Pivots and Profiles

Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation

R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482

Simply data + code

Can change the definition of

a figure and ultimately the

journal article

Colomb J and Brembs B

Sub-strains of Drosophila Canton-S differ

markedly in their locomotor behavior [v1

ref status indexed httpf1000res3is]

F1000Research 2014 3176

Other labs can replicate the study or

contribute their data to a meta-

analysis or disease model - figure

automatically updates

Data updates time-stamped

New conclusions added via versions

Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012

Software-like Release paradigm Not a static document paradigm

Reproduce looks backwards -gt Release looks forwards

raquo Science methods data change -gt agile evolution

raquo Comparisons versions forks amp merges dependencies

raquo Id amp Citations

raquo Interlinked ROs

[McEntyre]

Retrospective Release Research Object

The ROs Meme

recompute

replicatererun

repeat

re-examine

repurpose

recreate

reuse

restorereconstruct review

regeneraterevise

recycle

redo

What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo

ldquoshow A is true by doing Brdquo

verify but not falsify[Yong Nature 485 2012]

robustness tolerance

verification compliance

validation assurance

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

1 Science Changes So does the Lab

ldquoThe questions donrsquot

change but the

answers dordquoDan Reed

The lab is not fixedUpdated resources

UncertaintyBioSTIF

Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012

2 Instruments Break Labs Decaymaterials become unavailable technicians leave

Reproducibility Window

raquo Bit rot Black boxes

raquo Proprietary Licenses

raquo Clown services

raquo Partial replication

raquo Prepare to Repair

rsaquo form or function

rsaquo preserve or sustain

Jason Scott

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Methods(techniques algorithms

spec of the steps)

Materials(datasets parameters

algorithm seeds)

Experiment

Instruments(codes services scripts

underlying libraries)

Laboratory(sw and hw infrastructure

systems software

integrative platforms)

Setup

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Research Environment

submit articleand move onhellip

publish articlePublication Environment

Research Environment

publish articlePublication Environment

submit articleand move onhellip

[Adapted Freire 2013]

transparencydependencies

steps featuresprovenance trace

portability

robustness

preservation

accessavailable

descriptionintelligible

standardscommon APIs

licensing

standardscommon

metadata

change managementversioning

packaging

Machine actionable

Machine actionable

Reproducibility Framework

submit articleand move onhellip

Reporting

Documentation

Provenance ndashThick Trace Data

to Distilled Reporting

Distillation and

Summarisation

Alper P et al LabelFlow Exploiting Workflow Provenance to Surface Scientific Data Provenance IPAW 2014 84-96

Reproduce by ReadingArchived Record Retain the ProcessCode

The IT Crowd Series 3 Episode 4

The eLab Virtual Machine (or Docker Image)

a black box thoughdockercom

Reproduce by Running Active InstrumentRetain the bits

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

Shared Repository

Personal Notebook

Community Registry

Publishing Resource

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

standardsAdobe

UCFORE PROVODF

formats

api

Instrument

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Instrument

NISO-JATS

Instrument

J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013

Platform profiles

NISO-JATS

Instrument

Container

Manifest

OMEX archive

httpsresearchobjectgithubiospecificationsbundle

Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369

Retro-Fitted ROsusing off the shelf

platforms

Method Matters

Reproducibility Smarts

Commons not Repository

Research Tardis

Retro-fit ROs

Do As Little As Possible

Make -gt Born

Native RO platforms

RARE amp FAIR Knowledge Turns Means Research Objects

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researchers

Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

PI Team

RARE Research Reality Check

RARE Research Reality Check

Tribal Behaviour

raquo Gangs share but not with the public

raquo Tribal behaviours rsaquo Modellers share more than Experimentalists

rsaquo Experimentalists reuse models more than Modellers

raquo Trading behavioursrsaquo Collaboration ndash complementarity

correlations

raquo Structured consortia less likely to publicly share than individuals

raquo Post-hoc rationalised DataModel Cycles

[Garza 2014]

raquo Fluid transient collaborations gt ldquomy gangrdquo management

raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity

raquo Class captains (prefects)

raquo Get the cool kids on board

raquo Head teacher leadership

[Garza 2014]

Playground Rules

Trace Data

27032015 74

me

ME

my team

closecolleagues

peers

The Research Release Creep Spiral

raquo Data Hugging amp Flirting

raquo Reciprocity norms

raquo Hans W request

raquo Dowry phenomenon

raquo Private installations

raquo Private spaces on shared installations

raquo Safe havens

Too ugly to show anyone else

Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it

The code is too sophisticated for most readersrefereesI didnt work out all the details

I didnt actually write the code -- my student did

My competitors would be unfair to me

Its valuable intellectual property

It would make papers much longer

Referees would never agree to check the code

My code invokes other code with unpublished (proprietary) code

Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011

Drivers

love money

fame duty

fear timeeffort

shame duty

[Apologies to Resnick and Malone]

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Interface Framingraquo Limited scheduled sharing choices

rsaquo Never say never

raquo ldquoCitablerdquo not ldquoSharedrdquo

raquo Feedback

rsaquo Guilt tripping

rsaquo Outlier finger pointing

[Garzia]

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipResearch Currencies

ldquoResearchBitCoinrdquo

Citation Semantics

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

Page 22: I conference2015 goble-finalupload

Personal Data

Local Stores

External

Databases

Articles

Models

Standards

SOPs

Aggregated Commons Infrastructure

Consistent Comparative Reporting

Design protocols samples software modelshellip

httpwwwseek4scienceorg httpwwwfair-domorg httpisatoolsorg

Pop-Up Start UpsLittle Science within Big Science

How do Scientists Collaborate amp Cooperatively ExchangeCautiously Its all about The Trust

Extrinsic Driver

How do you get Scientists and Developers to work together Socially Its all about The Trust

Jam today Jam tomorrow Jam for all Just enough Jam Just in Time not Just in Case

Research Objects

Compound Interconnected Investigations Research Products

Multi-variousProductsPlatformsResources

Units of exchange commons contextual metadata

httpwwwresearchobjectorg

httpwwwresearchobjectorg

First class citizens - data software methods - id manage credit track profile focus

A Framework to Bundle and Link (scattered) resources related experiments Metadata Objects that carry Research Context

Research Objects

Bigger on the inside than the outside

Contentbull closed lt-gt open bull local lt-gt alienbull embed lt-gt referbull fixed lt-gt fluidbull nestedbull cite resolve steward

Contributionsbull multi ndashtyped stewarded

sited authoredbull span research researchers

platforms timebull cite resolve steward

Identity + Minimal Provenance

RO Resolution and Citation

rsaquo Defend it (snapshot)

rsaquo Locate it (most recent)

rsaquo Reuse it (a version a component)

rsaquo Credit it (contributory authorship)

rsaquo Cross link it (connections)

Biological Study Records (eg PRIDE) stableBiological Knowledge (eg UNIPROT) evolving

Goble De Roure Bechhofer Accelerating Knowledge Turns I3CK 2013

means

ends

driver

Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data Farr Research Object Commons

STELAR Asthma e-Lab Study Team for Early Life Asthma Research

Platform exchange ClinicalCodesorg coded patient cohorts exchange with NHS FARSITE system

STELAR e-Lab

Platform 1

Platform 2

Platform 3

A multi-site collaboration to support safe use of patient and research data for medical research

Research Object CurrencyCohort Studies

Focus on methods models workflows scripts software data figureshellip

Research Object Pivots and Profiles

Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation

R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482

Simply data + code

Can change the definition of

a figure and ultimately the

journal article

Colomb J and Brembs B

Sub-strains of Drosophila Canton-S differ

markedly in their locomotor behavior [v1

ref status indexed httpf1000res3is]

F1000Research 2014 3176

Other labs can replicate the study or

contribute their data to a meta-

analysis or disease model - figure

automatically updates

Data updates time-stamped

New conclusions added via versions

Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012

Software-like Release paradigm Not a static document paradigm

Reproduce looks backwards -gt Release looks forwards

raquo Science methods data change -gt agile evolution

raquo Comparisons versions forks amp merges dependencies

raquo Id amp Citations

raquo Interlinked ROs

[McEntyre]

Retrospective Release Research Object

The ROs Meme

recompute

replicatererun

repeat

re-examine

repurpose

recreate

reuse

restorereconstruct review

regeneraterevise

recycle

redo

What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo

ldquoshow A is true by doing Brdquo

verify but not falsify[Yong Nature 485 2012]

robustness tolerance

verification compliance

validation assurance

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

1 Science Changes So does the Lab

ldquoThe questions donrsquot

change but the

answers dordquoDan Reed

The lab is not fixedUpdated resources

UncertaintyBioSTIF

Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012

2 Instruments Break Labs Decaymaterials become unavailable technicians leave

Reproducibility Window

raquo Bit rot Black boxes

raquo Proprietary Licenses

raquo Clown services

raquo Partial replication

raquo Prepare to Repair

rsaquo form or function

rsaquo preserve or sustain

Jason Scott

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Methods(techniques algorithms

spec of the steps)

Materials(datasets parameters

algorithm seeds)

Experiment

Instruments(codes services scripts

underlying libraries)

Laboratory(sw and hw infrastructure

systems software

integrative platforms)

Setup

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Research Environment

submit articleand move onhellip

publish articlePublication Environment

Research Environment

publish articlePublication Environment

submit articleand move onhellip

[Adapted Freire 2013]

transparencydependencies

steps featuresprovenance trace

portability

robustness

preservation

accessavailable

descriptionintelligible

standardscommon APIs

licensing

standardscommon

metadata

change managementversioning

packaging

Machine actionable

Machine actionable

Reproducibility Framework

submit articleand move onhellip

Reporting

Documentation

Provenance ndashThick Trace Data

to Distilled Reporting

Distillation and

Summarisation

Alper P et al LabelFlow Exploiting Workflow Provenance to Surface Scientific Data Provenance IPAW 2014 84-96

Reproduce by ReadingArchived Record Retain the ProcessCode

The IT Crowd Series 3 Episode 4

The eLab Virtual Machine (or Docker Image)

a black box thoughdockercom

Reproduce by Running Active InstrumentRetain the bits

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

Shared Repository

Personal Notebook

Community Registry

Publishing Resource

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

standardsAdobe

UCFORE PROVODF

formats

api

Instrument

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Instrument

NISO-JATS

Instrument

J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013

Platform profiles

NISO-JATS

Instrument

Container

Manifest

OMEX archive

httpsresearchobjectgithubiospecificationsbundle

Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369

Retro-Fitted ROsusing off the shelf

platforms

Method Matters

Reproducibility Smarts

Commons not Repository

Research Tardis

Retro-fit ROs

Do As Little As Possible

Make -gt Born

Native RO platforms

RARE amp FAIR Knowledge Turns Means Research Objects

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researchers

Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

PI Team

RARE Research Reality Check

RARE Research Reality Check

Tribal Behaviour

raquo Gangs share but not with the public

raquo Tribal behaviours rsaquo Modellers share more than Experimentalists

rsaquo Experimentalists reuse models more than Modellers

raquo Trading behavioursrsaquo Collaboration ndash complementarity

correlations

raquo Structured consortia less likely to publicly share than individuals

raquo Post-hoc rationalised DataModel Cycles

[Garza 2014]

raquo Fluid transient collaborations gt ldquomy gangrdquo management

raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity

raquo Class captains (prefects)

raquo Get the cool kids on board

raquo Head teacher leadership

[Garza 2014]

Playground Rules

Trace Data

27032015 74

me

ME

my team

closecolleagues

peers

The Research Release Creep Spiral

raquo Data Hugging amp Flirting

raquo Reciprocity norms

raquo Hans W request

raquo Dowry phenomenon

raquo Private installations

raquo Private spaces on shared installations

raquo Safe havens

Too ugly to show anyone else

Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it

The code is too sophisticated for most readersrefereesI didnt work out all the details

I didnt actually write the code -- my student did

My competitors would be unfair to me

Its valuable intellectual property

It would make papers much longer

Referees would never agree to check the code

My code invokes other code with unpublished (proprietary) code

Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011

Drivers

love money

fame duty

fear timeeffort

shame duty

[Apologies to Resnick and Malone]

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Interface Framingraquo Limited scheduled sharing choices

rsaquo Never say never

raquo ldquoCitablerdquo not ldquoSharedrdquo

raquo Feedback

rsaquo Guilt tripping

rsaquo Outlier finger pointing

[Garzia]

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipResearch Currencies

ldquoResearchBitCoinrdquo

Citation Semantics

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

Page 23: I conference2015 goble-finalupload

Aggregated Commons Infrastructure

Consistent Comparative Reporting

Design protocols samples software modelshellip

httpwwwseek4scienceorg httpwwwfair-domorg httpisatoolsorg

Pop-Up Start UpsLittle Science within Big Science

How do Scientists Collaborate amp Cooperatively ExchangeCautiously Its all about The Trust

Extrinsic Driver

How do you get Scientists and Developers to work together Socially Its all about The Trust

Jam today Jam tomorrow Jam for all Just enough Jam Just in Time not Just in Case

Research Objects

Compound Interconnected Investigations Research Products

Multi-variousProductsPlatformsResources

Units of exchange commons contextual metadata

httpwwwresearchobjectorg

httpwwwresearchobjectorg

First class citizens - data software methods - id manage credit track profile focus

A Framework to Bundle and Link (scattered) resources related experiments Metadata Objects that carry Research Context

Research Objects

Bigger on the inside than the outside

Contentbull closed lt-gt open bull local lt-gt alienbull embed lt-gt referbull fixed lt-gt fluidbull nestedbull cite resolve steward

Contributionsbull multi ndashtyped stewarded

sited authoredbull span research researchers

platforms timebull cite resolve steward

Identity + Minimal Provenance

RO Resolution and Citation

rsaquo Defend it (snapshot)

rsaquo Locate it (most recent)

rsaquo Reuse it (a version a component)

rsaquo Credit it (contributory authorship)

rsaquo Cross link it (connections)

Biological Study Records (eg PRIDE) stableBiological Knowledge (eg UNIPROT) evolving

Goble De Roure Bechhofer Accelerating Knowledge Turns I3CK 2013

means

ends

driver

Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data Farr Research Object Commons

STELAR Asthma e-Lab Study Team for Early Life Asthma Research

Platform exchange ClinicalCodesorg coded patient cohorts exchange with NHS FARSITE system

STELAR e-Lab

Platform 1

Platform 2

Platform 3

A multi-site collaboration to support safe use of patient and research data for medical research

Research Object CurrencyCohort Studies

Focus on methods models workflows scripts software data figureshellip

Research Object Pivots and Profiles

Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation

R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482

Simply data + code

Can change the definition of

a figure and ultimately the

journal article

Colomb J and Brembs B

Sub-strains of Drosophila Canton-S differ

markedly in their locomotor behavior [v1

ref status indexed httpf1000res3is]

F1000Research 2014 3176

Other labs can replicate the study or

contribute their data to a meta-

analysis or disease model - figure

automatically updates

Data updates time-stamped

New conclusions added via versions

Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012

Software-like Release paradigm Not a static document paradigm

Reproduce looks backwards -gt Release looks forwards

raquo Science methods data change -gt agile evolution

raquo Comparisons versions forks amp merges dependencies

raquo Id amp Citations

raquo Interlinked ROs

[McEntyre]

Retrospective Release Research Object

The ROs Meme

recompute

replicatererun

repeat

re-examine

repurpose

recreate

reuse

restorereconstruct review

regeneraterevise

recycle

redo

What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo

ldquoshow A is true by doing Brdquo

verify but not falsify[Yong Nature 485 2012]

robustness tolerance

verification compliance

validation assurance

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

1 Science Changes So does the Lab

ldquoThe questions donrsquot

change but the

answers dordquoDan Reed

The lab is not fixedUpdated resources

UncertaintyBioSTIF

Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012

2 Instruments Break Labs Decaymaterials become unavailable technicians leave

Reproducibility Window

raquo Bit rot Black boxes

raquo Proprietary Licenses

raquo Clown services

raquo Partial replication

raquo Prepare to Repair

rsaquo form or function

rsaquo preserve or sustain

Jason Scott

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Methods(techniques algorithms

spec of the steps)

Materials(datasets parameters

algorithm seeds)

Experiment

Instruments(codes services scripts

underlying libraries)

Laboratory(sw and hw infrastructure

systems software

integrative platforms)

Setup

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Research Environment

submit articleand move onhellip

publish articlePublication Environment

Research Environment

publish articlePublication Environment

submit articleand move onhellip

[Adapted Freire 2013]

transparencydependencies

steps featuresprovenance trace

portability

robustness

preservation

accessavailable

descriptionintelligible

standardscommon APIs

licensing

standardscommon

metadata

change managementversioning

packaging

Machine actionable

Machine actionable

Reproducibility Framework

submit articleand move onhellip

Reporting

Documentation

Provenance ndashThick Trace Data

to Distilled Reporting

Distillation and

Summarisation

Alper P et al LabelFlow Exploiting Workflow Provenance to Surface Scientific Data Provenance IPAW 2014 84-96

Reproduce by ReadingArchived Record Retain the ProcessCode

The IT Crowd Series 3 Episode 4

The eLab Virtual Machine (or Docker Image)

a black box thoughdockercom

Reproduce by Running Active InstrumentRetain the bits

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

Shared Repository

Personal Notebook

Community Registry

Publishing Resource

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

standardsAdobe

UCFORE PROVODF

formats

api

Instrument

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Instrument

NISO-JATS

Instrument

J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013

Platform profiles

NISO-JATS

Instrument

Container

Manifest

OMEX archive

httpsresearchobjectgithubiospecificationsbundle

Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369

Retro-Fitted ROsusing off the shelf

platforms

Method Matters

Reproducibility Smarts

Commons not Repository

Research Tardis

Retro-fit ROs

Do As Little As Possible

Make -gt Born

Native RO platforms

RARE amp FAIR Knowledge Turns Means Research Objects

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researchers

Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

PI Team

RARE Research Reality Check

RARE Research Reality Check

Tribal Behaviour

raquo Gangs share but not with the public

raquo Tribal behaviours rsaquo Modellers share more than Experimentalists

rsaquo Experimentalists reuse models more than Modellers

raquo Trading behavioursrsaquo Collaboration ndash complementarity

correlations

raquo Structured consortia less likely to publicly share than individuals

raquo Post-hoc rationalised DataModel Cycles

[Garza 2014]

raquo Fluid transient collaborations gt ldquomy gangrdquo management

raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity

raquo Class captains (prefects)

raquo Get the cool kids on board

raquo Head teacher leadership

[Garza 2014]

Playground Rules

Trace Data

27032015 74

me

ME

my team

closecolleagues

peers

The Research Release Creep Spiral

raquo Data Hugging amp Flirting

raquo Reciprocity norms

raquo Hans W request

raquo Dowry phenomenon

raquo Private installations

raquo Private spaces on shared installations

raquo Safe havens

Too ugly to show anyone else

Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it

The code is too sophisticated for most readersrefereesI didnt work out all the details

I didnt actually write the code -- my student did

My competitors would be unfair to me

Its valuable intellectual property

It would make papers much longer

Referees would never agree to check the code

My code invokes other code with unpublished (proprietary) code

Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011

Drivers

love money

fame duty

fear timeeffort

shame duty

[Apologies to Resnick and Malone]

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Interface Framingraquo Limited scheduled sharing choices

rsaquo Never say never

raquo ldquoCitablerdquo not ldquoSharedrdquo

raquo Feedback

rsaquo Guilt tripping

rsaquo Outlier finger pointing

[Garzia]

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipResearch Currencies

ldquoResearchBitCoinrdquo

Citation Semantics

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

Page 24: I conference2015 goble-finalupload

Pop-Up Start UpsLittle Science within Big Science

How do Scientists Collaborate amp Cooperatively ExchangeCautiously Its all about The Trust

Extrinsic Driver

How do you get Scientists and Developers to work together Socially Its all about The Trust

Jam today Jam tomorrow Jam for all Just enough Jam Just in Time not Just in Case

Research Objects

Compound Interconnected Investigations Research Products

Multi-variousProductsPlatformsResources

Units of exchange commons contextual metadata

httpwwwresearchobjectorg

httpwwwresearchobjectorg

First class citizens - data software methods - id manage credit track profile focus

A Framework to Bundle and Link (scattered) resources related experiments Metadata Objects that carry Research Context

Research Objects

Bigger on the inside than the outside

Contentbull closed lt-gt open bull local lt-gt alienbull embed lt-gt referbull fixed lt-gt fluidbull nestedbull cite resolve steward

Contributionsbull multi ndashtyped stewarded

sited authoredbull span research researchers

platforms timebull cite resolve steward

Identity + Minimal Provenance

RO Resolution and Citation

rsaquo Defend it (snapshot)

rsaquo Locate it (most recent)

rsaquo Reuse it (a version a component)

rsaquo Credit it (contributory authorship)

rsaquo Cross link it (connections)

Biological Study Records (eg PRIDE) stableBiological Knowledge (eg UNIPROT) evolving

Goble De Roure Bechhofer Accelerating Knowledge Turns I3CK 2013

means

ends

driver

Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data Farr Research Object Commons

STELAR Asthma e-Lab Study Team for Early Life Asthma Research

Platform exchange ClinicalCodesorg coded patient cohorts exchange with NHS FARSITE system

STELAR e-Lab

Platform 1

Platform 2

Platform 3

A multi-site collaboration to support safe use of patient and research data for medical research

Research Object CurrencyCohort Studies

Focus on methods models workflows scripts software data figureshellip

Research Object Pivots and Profiles

Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation

R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482

Simply data + code

Can change the definition of

a figure and ultimately the

journal article

Colomb J and Brembs B

Sub-strains of Drosophila Canton-S differ

markedly in their locomotor behavior [v1

ref status indexed httpf1000res3is]

F1000Research 2014 3176

Other labs can replicate the study or

contribute their data to a meta-

analysis or disease model - figure

automatically updates

Data updates time-stamped

New conclusions added via versions

Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012

Software-like Release paradigm Not a static document paradigm

Reproduce looks backwards -gt Release looks forwards

raquo Science methods data change -gt agile evolution

raquo Comparisons versions forks amp merges dependencies

raquo Id amp Citations

raquo Interlinked ROs

[McEntyre]

Retrospective Release Research Object

The ROs Meme

recompute

replicatererun

repeat

re-examine

repurpose

recreate

reuse

restorereconstruct review

regeneraterevise

recycle

redo

What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo

ldquoshow A is true by doing Brdquo

verify but not falsify[Yong Nature 485 2012]

robustness tolerance

verification compliance

validation assurance

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

1 Science Changes So does the Lab

ldquoThe questions donrsquot

change but the

answers dordquoDan Reed

The lab is not fixedUpdated resources

UncertaintyBioSTIF

Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012

2 Instruments Break Labs Decaymaterials become unavailable technicians leave

Reproducibility Window

raquo Bit rot Black boxes

raquo Proprietary Licenses

raquo Clown services

raquo Partial replication

raquo Prepare to Repair

rsaquo form or function

rsaquo preserve or sustain

Jason Scott

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Methods(techniques algorithms

spec of the steps)

Materials(datasets parameters

algorithm seeds)

Experiment

Instruments(codes services scripts

underlying libraries)

Laboratory(sw and hw infrastructure

systems software

integrative platforms)

Setup

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Research Environment

submit articleand move onhellip

publish articlePublication Environment

Research Environment

publish articlePublication Environment

submit articleand move onhellip

[Adapted Freire 2013]

transparencydependencies

steps featuresprovenance trace

portability

robustness

preservation

accessavailable

descriptionintelligible

standardscommon APIs

licensing

standardscommon

metadata

change managementversioning

packaging

Machine actionable

Machine actionable

Reproducibility Framework

submit articleand move onhellip

Reporting

Documentation

Provenance ndashThick Trace Data

to Distilled Reporting

Distillation and

Summarisation

Alper P et al LabelFlow Exploiting Workflow Provenance to Surface Scientific Data Provenance IPAW 2014 84-96

Reproduce by ReadingArchived Record Retain the ProcessCode

The IT Crowd Series 3 Episode 4

The eLab Virtual Machine (or Docker Image)

a black box thoughdockercom

Reproduce by Running Active InstrumentRetain the bits

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

Shared Repository

Personal Notebook

Community Registry

Publishing Resource

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

standardsAdobe

UCFORE PROVODF

formats

api

Instrument

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Instrument

NISO-JATS

Instrument

J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013

Platform profiles

NISO-JATS

Instrument

Container

Manifest

OMEX archive

httpsresearchobjectgithubiospecificationsbundle

Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369

Retro-Fitted ROsusing off the shelf

platforms

Method Matters

Reproducibility Smarts

Commons not Repository

Research Tardis

Retro-fit ROs

Do As Little As Possible

Make -gt Born

Native RO platforms

RARE amp FAIR Knowledge Turns Means Research Objects

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researchers

Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

PI Team

RARE Research Reality Check

RARE Research Reality Check

Tribal Behaviour

raquo Gangs share but not with the public

raquo Tribal behaviours rsaquo Modellers share more than Experimentalists

rsaquo Experimentalists reuse models more than Modellers

raquo Trading behavioursrsaquo Collaboration ndash complementarity

correlations

raquo Structured consortia less likely to publicly share than individuals

raquo Post-hoc rationalised DataModel Cycles

[Garza 2014]

raquo Fluid transient collaborations gt ldquomy gangrdquo management

raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity

raquo Class captains (prefects)

raquo Get the cool kids on board

raquo Head teacher leadership

[Garza 2014]

Playground Rules

Trace Data

27032015 74

me

ME

my team

closecolleagues

peers

The Research Release Creep Spiral

raquo Data Hugging amp Flirting

raquo Reciprocity norms

raquo Hans W request

raquo Dowry phenomenon

raquo Private installations

raquo Private spaces on shared installations

raquo Safe havens

Too ugly to show anyone else

Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it

The code is too sophisticated for most readersrefereesI didnt work out all the details

I didnt actually write the code -- my student did

My competitors would be unfair to me

Its valuable intellectual property

It would make papers much longer

Referees would never agree to check the code

My code invokes other code with unpublished (proprietary) code

Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011

Drivers

love money

fame duty

fear timeeffort

shame duty

[Apologies to Resnick and Malone]

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Interface Framingraquo Limited scheduled sharing choices

rsaquo Never say never

raquo ldquoCitablerdquo not ldquoSharedrdquo

raquo Feedback

rsaquo Guilt tripping

rsaquo Outlier finger pointing

[Garzia]

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipResearch Currencies

ldquoResearchBitCoinrdquo

Citation Semantics

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

Page 25: I conference2015 goble-finalupload

How do Scientists Collaborate amp Cooperatively ExchangeCautiously Its all about The Trust

Extrinsic Driver

How do you get Scientists and Developers to work together Socially Its all about The Trust

Jam today Jam tomorrow Jam for all Just enough Jam Just in Time not Just in Case

Research Objects

Compound Interconnected Investigations Research Products

Multi-variousProductsPlatformsResources

Units of exchange commons contextual metadata

httpwwwresearchobjectorg

httpwwwresearchobjectorg

First class citizens - data software methods - id manage credit track profile focus

A Framework to Bundle and Link (scattered) resources related experiments Metadata Objects that carry Research Context

Research Objects

Bigger on the inside than the outside

Contentbull closed lt-gt open bull local lt-gt alienbull embed lt-gt referbull fixed lt-gt fluidbull nestedbull cite resolve steward

Contributionsbull multi ndashtyped stewarded

sited authoredbull span research researchers

platforms timebull cite resolve steward

Identity + Minimal Provenance

RO Resolution and Citation

rsaquo Defend it (snapshot)

rsaquo Locate it (most recent)

rsaquo Reuse it (a version a component)

rsaquo Credit it (contributory authorship)

rsaquo Cross link it (connections)

Biological Study Records (eg PRIDE) stableBiological Knowledge (eg UNIPROT) evolving

Goble De Roure Bechhofer Accelerating Knowledge Turns I3CK 2013

means

ends

driver

Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data Farr Research Object Commons

STELAR Asthma e-Lab Study Team for Early Life Asthma Research

Platform exchange ClinicalCodesorg coded patient cohorts exchange with NHS FARSITE system

STELAR e-Lab

Platform 1

Platform 2

Platform 3

A multi-site collaboration to support safe use of patient and research data for medical research

Research Object CurrencyCohort Studies

Focus on methods models workflows scripts software data figureshellip

Research Object Pivots and Profiles

Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation

R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482

Simply data + code

Can change the definition of

a figure and ultimately the

journal article

Colomb J and Brembs B

Sub-strains of Drosophila Canton-S differ

markedly in their locomotor behavior [v1

ref status indexed httpf1000res3is]

F1000Research 2014 3176

Other labs can replicate the study or

contribute their data to a meta-

analysis or disease model - figure

automatically updates

Data updates time-stamped

New conclusions added via versions

Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012

Software-like Release paradigm Not a static document paradigm

Reproduce looks backwards -gt Release looks forwards

raquo Science methods data change -gt agile evolution

raquo Comparisons versions forks amp merges dependencies

raquo Id amp Citations

raquo Interlinked ROs

[McEntyre]

Retrospective Release Research Object

The ROs Meme

recompute

replicatererun

repeat

re-examine

repurpose

recreate

reuse

restorereconstruct review

regeneraterevise

recycle

redo

What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo

ldquoshow A is true by doing Brdquo

verify but not falsify[Yong Nature 485 2012]

robustness tolerance

verification compliance

validation assurance

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

1 Science Changes So does the Lab

ldquoThe questions donrsquot

change but the

answers dordquoDan Reed

The lab is not fixedUpdated resources

UncertaintyBioSTIF

Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012

2 Instruments Break Labs Decaymaterials become unavailable technicians leave

Reproducibility Window

raquo Bit rot Black boxes

raquo Proprietary Licenses

raquo Clown services

raquo Partial replication

raquo Prepare to Repair

rsaquo form or function

rsaquo preserve or sustain

Jason Scott

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Methods(techniques algorithms

spec of the steps)

Materials(datasets parameters

algorithm seeds)

Experiment

Instruments(codes services scripts

underlying libraries)

Laboratory(sw and hw infrastructure

systems software

integrative platforms)

Setup

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Research Environment

submit articleand move onhellip

publish articlePublication Environment

Research Environment

publish articlePublication Environment

submit articleand move onhellip

[Adapted Freire 2013]

transparencydependencies

steps featuresprovenance trace

portability

robustness

preservation

accessavailable

descriptionintelligible

standardscommon APIs

licensing

standardscommon

metadata

change managementversioning

packaging

Machine actionable

Machine actionable

Reproducibility Framework

submit articleand move onhellip

Reporting

Documentation

Provenance ndashThick Trace Data

to Distilled Reporting

Distillation and

Summarisation

Alper P et al LabelFlow Exploiting Workflow Provenance to Surface Scientific Data Provenance IPAW 2014 84-96

Reproduce by ReadingArchived Record Retain the ProcessCode

The IT Crowd Series 3 Episode 4

The eLab Virtual Machine (or Docker Image)

a black box thoughdockercom

Reproduce by Running Active InstrumentRetain the bits

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

Shared Repository

Personal Notebook

Community Registry

Publishing Resource

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

standardsAdobe

UCFORE PROVODF

formats

api

Instrument

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Instrument

NISO-JATS

Instrument

J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013

Platform profiles

NISO-JATS

Instrument

Container

Manifest

OMEX archive

httpsresearchobjectgithubiospecificationsbundle

Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369

Retro-Fitted ROsusing off the shelf

platforms

Method Matters

Reproducibility Smarts

Commons not Repository

Research Tardis

Retro-fit ROs

Do As Little As Possible

Make -gt Born

Native RO platforms

RARE amp FAIR Knowledge Turns Means Research Objects

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researchers

Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

PI Team

RARE Research Reality Check

RARE Research Reality Check

Tribal Behaviour

raquo Gangs share but not with the public

raquo Tribal behaviours rsaquo Modellers share more than Experimentalists

rsaquo Experimentalists reuse models more than Modellers

raquo Trading behavioursrsaquo Collaboration ndash complementarity

correlations

raquo Structured consortia less likely to publicly share than individuals

raquo Post-hoc rationalised DataModel Cycles

[Garza 2014]

raquo Fluid transient collaborations gt ldquomy gangrdquo management

raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity

raquo Class captains (prefects)

raquo Get the cool kids on board

raquo Head teacher leadership

[Garza 2014]

Playground Rules

Trace Data

27032015 74

me

ME

my team

closecolleagues

peers

The Research Release Creep Spiral

raquo Data Hugging amp Flirting

raquo Reciprocity norms

raquo Hans W request

raquo Dowry phenomenon

raquo Private installations

raquo Private spaces on shared installations

raquo Safe havens

Too ugly to show anyone else

Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it

The code is too sophisticated for most readersrefereesI didnt work out all the details

I didnt actually write the code -- my student did

My competitors would be unfair to me

Its valuable intellectual property

It would make papers much longer

Referees would never agree to check the code

My code invokes other code with unpublished (proprietary) code

Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011

Drivers

love money

fame duty

fear timeeffort

shame duty

[Apologies to Resnick and Malone]

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Interface Framingraquo Limited scheduled sharing choices

rsaquo Never say never

raquo ldquoCitablerdquo not ldquoSharedrdquo

raquo Feedback

rsaquo Guilt tripping

rsaquo Outlier finger pointing

[Garzia]

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipResearch Currencies

ldquoResearchBitCoinrdquo

Citation Semantics

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

Page 26: I conference2015 goble-finalupload

How do you get Scientists and Developers to work together Socially Its all about The Trust

Jam today Jam tomorrow Jam for all Just enough Jam Just in Time not Just in Case

Research Objects

Compound Interconnected Investigations Research Products

Multi-variousProductsPlatformsResources

Units of exchange commons contextual metadata

httpwwwresearchobjectorg

httpwwwresearchobjectorg

First class citizens - data software methods - id manage credit track profile focus

A Framework to Bundle and Link (scattered) resources related experiments Metadata Objects that carry Research Context

Research Objects

Bigger on the inside than the outside

Contentbull closed lt-gt open bull local lt-gt alienbull embed lt-gt referbull fixed lt-gt fluidbull nestedbull cite resolve steward

Contributionsbull multi ndashtyped stewarded

sited authoredbull span research researchers

platforms timebull cite resolve steward

Identity + Minimal Provenance

RO Resolution and Citation

rsaquo Defend it (snapshot)

rsaquo Locate it (most recent)

rsaquo Reuse it (a version a component)

rsaquo Credit it (contributory authorship)

rsaquo Cross link it (connections)

Biological Study Records (eg PRIDE) stableBiological Knowledge (eg UNIPROT) evolving

Goble De Roure Bechhofer Accelerating Knowledge Turns I3CK 2013

means

ends

driver

Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data Farr Research Object Commons

STELAR Asthma e-Lab Study Team for Early Life Asthma Research

Platform exchange ClinicalCodesorg coded patient cohorts exchange with NHS FARSITE system

STELAR e-Lab

Platform 1

Platform 2

Platform 3

A multi-site collaboration to support safe use of patient and research data for medical research

Research Object CurrencyCohort Studies

Focus on methods models workflows scripts software data figureshellip

Research Object Pivots and Profiles

Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation

R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482

Simply data + code

Can change the definition of

a figure and ultimately the

journal article

Colomb J and Brembs B

Sub-strains of Drosophila Canton-S differ

markedly in their locomotor behavior [v1

ref status indexed httpf1000res3is]

F1000Research 2014 3176

Other labs can replicate the study or

contribute their data to a meta-

analysis or disease model - figure

automatically updates

Data updates time-stamped

New conclusions added via versions

Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012

Software-like Release paradigm Not a static document paradigm

Reproduce looks backwards -gt Release looks forwards

raquo Science methods data change -gt agile evolution

raquo Comparisons versions forks amp merges dependencies

raquo Id amp Citations

raquo Interlinked ROs

[McEntyre]

Retrospective Release Research Object

The ROs Meme

recompute

replicatererun

repeat

re-examine

repurpose

recreate

reuse

restorereconstruct review

regeneraterevise

recycle

redo

What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo

ldquoshow A is true by doing Brdquo

verify but not falsify[Yong Nature 485 2012]

robustness tolerance

verification compliance

validation assurance

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

1 Science Changes So does the Lab

ldquoThe questions donrsquot

change but the

answers dordquoDan Reed

The lab is not fixedUpdated resources

UncertaintyBioSTIF

Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012

2 Instruments Break Labs Decaymaterials become unavailable technicians leave

Reproducibility Window

raquo Bit rot Black boxes

raquo Proprietary Licenses

raquo Clown services

raquo Partial replication

raquo Prepare to Repair

rsaquo form or function

rsaquo preserve or sustain

Jason Scott

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Methods(techniques algorithms

spec of the steps)

Materials(datasets parameters

algorithm seeds)

Experiment

Instruments(codes services scripts

underlying libraries)

Laboratory(sw and hw infrastructure

systems software

integrative platforms)

Setup

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Research Environment

submit articleand move onhellip

publish articlePublication Environment

Research Environment

publish articlePublication Environment

submit articleand move onhellip

[Adapted Freire 2013]

transparencydependencies

steps featuresprovenance trace

portability

robustness

preservation

accessavailable

descriptionintelligible

standardscommon APIs

licensing

standardscommon

metadata

change managementversioning

packaging

Machine actionable

Machine actionable

Reproducibility Framework

submit articleand move onhellip

Reporting

Documentation

Provenance ndashThick Trace Data

to Distilled Reporting

Distillation and

Summarisation

Alper P et al LabelFlow Exploiting Workflow Provenance to Surface Scientific Data Provenance IPAW 2014 84-96

Reproduce by ReadingArchived Record Retain the ProcessCode

The IT Crowd Series 3 Episode 4

The eLab Virtual Machine (or Docker Image)

a black box thoughdockercom

Reproduce by Running Active InstrumentRetain the bits

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

Shared Repository

Personal Notebook

Community Registry

Publishing Resource

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

standardsAdobe

UCFORE PROVODF

formats

api

Instrument

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Instrument

NISO-JATS

Instrument

J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013

Platform profiles

NISO-JATS

Instrument

Container

Manifest

OMEX archive

httpsresearchobjectgithubiospecificationsbundle

Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369

Retro-Fitted ROsusing off the shelf

platforms

Method Matters

Reproducibility Smarts

Commons not Repository

Research Tardis

Retro-fit ROs

Do As Little As Possible

Make -gt Born

Native RO platforms

RARE amp FAIR Knowledge Turns Means Research Objects

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researchers

Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

PI Team

RARE Research Reality Check

RARE Research Reality Check

Tribal Behaviour

raquo Gangs share but not with the public

raquo Tribal behaviours rsaquo Modellers share more than Experimentalists

rsaquo Experimentalists reuse models more than Modellers

raquo Trading behavioursrsaquo Collaboration ndash complementarity

correlations

raquo Structured consortia less likely to publicly share than individuals

raquo Post-hoc rationalised DataModel Cycles

[Garza 2014]

raquo Fluid transient collaborations gt ldquomy gangrdquo management

raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity

raquo Class captains (prefects)

raquo Get the cool kids on board

raquo Head teacher leadership

[Garza 2014]

Playground Rules

Trace Data

27032015 74

me

ME

my team

closecolleagues

peers

The Research Release Creep Spiral

raquo Data Hugging amp Flirting

raquo Reciprocity norms

raquo Hans W request

raquo Dowry phenomenon

raquo Private installations

raquo Private spaces on shared installations

raquo Safe havens

Too ugly to show anyone else

Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it

The code is too sophisticated for most readersrefereesI didnt work out all the details

I didnt actually write the code -- my student did

My competitors would be unfair to me

Its valuable intellectual property

It would make papers much longer

Referees would never agree to check the code

My code invokes other code with unpublished (proprietary) code

Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011

Drivers

love money

fame duty

fear timeeffort

shame duty

[Apologies to Resnick and Malone]

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Interface Framingraquo Limited scheduled sharing choices

rsaquo Never say never

raquo ldquoCitablerdquo not ldquoSharedrdquo

raquo Feedback

rsaquo Guilt tripping

rsaquo Outlier finger pointing

[Garzia]

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipResearch Currencies

ldquoResearchBitCoinrdquo

Citation Semantics

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

Page 27: I conference2015 goble-finalupload

Research Objects

Compound Interconnected Investigations Research Products

Multi-variousProductsPlatformsResources

Units of exchange commons contextual metadata

httpwwwresearchobjectorg

httpwwwresearchobjectorg

First class citizens - data software methods - id manage credit track profile focus

A Framework to Bundle and Link (scattered) resources related experiments Metadata Objects that carry Research Context

Research Objects

Bigger on the inside than the outside

Contentbull closed lt-gt open bull local lt-gt alienbull embed lt-gt referbull fixed lt-gt fluidbull nestedbull cite resolve steward

Contributionsbull multi ndashtyped stewarded

sited authoredbull span research researchers

platforms timebull cite resolve steward

Identity + Minimal Provenance

RO Resolution and Citation

rsaquo Defend it (snapshot)

rsaquo Locate it (most recent)

rsaquo Reuse it (a version a component)

rsaquo Credit it (contributory authorship)

rsaquo Cross link it (connections)

Biological Study Records (eg PRIDE) stableBiological Knowledge (eg UNIPROT) evolving

Goble De Roure Bechhofer Accelerating Knowledge Turns I3CK 2013

means

ends

driver

Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data Farr Research Object Commons

STELAR Asthma e-Lab Study Team for Early Life Asthma Research

Platform exchange ClinicalCodesorg coded patient cohorts exchange with NHS FARSITE system

STELAR e-Lab

Platform 1

Platform 2

Platform 3

A multi-site collaboration to support safe use of patient and research data for medical research

Research Object CurrencyCohort Studies

Focus on methods models workflows scripts software data figureshellip

Research Object Pivots and Profiles

Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation

R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482

Simply data + code

Can change the definition of

a figure and ultimately the

journal article

Colomb J and Brembs B

Sub-strains of Drosophila Canton-S differ

markedly in their locomotor behavior [v1

ref status indexed httpf1000res3is]

F1000Research 2014 3176

Other labs can replicate the study or

contribute their data to a meta-

analysis or disease model - figure

automatically updates

Data updates time-stamped

New conclusions added via versions

Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012

Software-like Release paradigm Not a static document paradigm

Reproduce looks backwards -gt Release looks forwards

raquo Science methods data change -gt agile evolution

raquo Comparisons versions forks amp merges dependencies

raquo Id amp Citations

raquo Interlinked ROs

[McEntyre]

Retrospective Release Research Object

The ROs Meme

recompute

replicatererun

repeat

re-examine

repurpose

recreate

reuse

restorereconstruct review

regeneraterevise

recycle

redo

What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo

ldquoshow A is true by doing Brdquo

verify but not falsify[Yong Nature 485 2012]

robustness tolerance

verification compliance

validation assurance

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

1 Science Changes So does the Lab

ldquoThe questions donrsquot

change but the

answers dordquoDan Reed

The lab is not fixedUpdated resources

UncertaintyBioSTIF

Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012

2 Instruments Break Labs Decaymaterials become unavailable technicians leave

Reproducibility Window

raquo Bit rot Black boxes

raquo Proprietary Licenses

raquo Clown services

raquo Partial replication

raquo Prepare to Repair

rsaquo form or function

rsaquo preserve or sustain

Jason Scott

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Methods(techniques algorithms

spec of the steps)

Materials(datasets parameters

algorithm seeds)

Experiment

Instruments(codes services scripts

underlying libraries)

Laboratory(sw and hw infrastructure

systems software

integrative platforms)

Setup

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Research Environment

submit articleand move onhellip

publish articlePublication Environment

Research Environment

publish articlePublication Environment

submit articleand move onhellip

[Adapted Freire 2013]

transparencydependencies

steps featuresprovenance trace

portability

robustness

preservation

accessavailable

descriptionintelligible

standardscommon APIs

licensing

standardscommon

metadata

change managementversioning

packaging

Machine actionable

Machine actionable

Reproducibility Framework

submit articleand move onhellip

Reporting

Documentation

Provenance ndashThick Trace Data

to Distilled Reporting

Distillation and

Summarisation

Alper P et al LabelFlow Exploiting Workflow Provenance to Surface Scientific Data Provenance IPAW 2014 84-96

Reproduce by ReadingArchived Record Retain the ProcessCode

The IT Crowd Series 3 Episode 4

The eLab Virtual Machine (or Docker Image)

a black box thoughdockercom

Reproduce by Running Active InstrumentRetain the bits

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

Shared Repository

Personal Notebook

Community Registry

Publishing Resource

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

standardsAdobe

UCFORE PROVODF

formats

api

Instrument

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Instrument

NISO-JATS

Instrument

J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013

Platform profiles

NISO-JATS

Instrument

Container

Manifest

OMEX archive

httpsresearchobjectgithubiospecificationsbundle

Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369

Retro-Fitted ROsusing off the shelf

platforms

Method Matters

Reproducibility Smarts

Commons not Repository

Research Tardis

Retro-fit ROs

Do As Little As Possible

Make -gt Born

Native RO platforms

RARE amp FAIR Knowledge Turns Means Research Objects

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researchers

Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

PI Team

RARE Research Reality Check

RARE Research Reality Check

Tribal Behaviour

raquo Gangs share but not with the public

raquo Tribal behaviours rsaquo Modellers share more than Experimentalists

rsaquo Experimentalists reuse models more than Modellers

raquo Trading behavioursrsaquo Collaboration ndash complementarity

correlations

raquo Structured consortia less likely to publicly share than individuals

raquo Post-hoc rationalised DataModel Cycles

[Garza 2014]

raquo Fluid transient collaborations gt ldquomy gangrdquo management

raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity

raquo Class captains (prefects)

raquo Get the cool kids on board

raquo Head teacher leadership

[Garza 2014]

Playground Rules

Trace Data

27032015 74

me

ME

my team

closecolleagues

peers

The Research Release Creep Spiral

raquo Data Hugging amp Flirting

raquo Reciprocity norms

raquo Hans W request

raquo Dowry phenomenon

raquo Private installations

raquo Private spaces on shared installations

raquo Safe havens

Too ugly to show anyone else

Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it

The code is too sophisticated for most readersrefereesI didnt work out all the details

I didnt actually write the code -- my student did

My competitors would be unfair to me

Its valuable intellectual property

It would make papers much longer

Referees would never agree to check the code

My code invokes other code with unpublished (proprietary) code

Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011

Drivers

love money

fame duty

fear timeeffort

shame duty

[Apologies to Resnick and Malone]

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Interface Framingraquo Limited scheduled sharing choices

rsaquo Never say never

raquo ldquoCitablerdquo not ldquoSharedrdquo

raquo Feedback

rsaquo Guilt tripping

rsaquo Outlier finger pointing

[Garzia]

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipResearch Currencies

ldquoResearchBitCoinrdquo

Citation Semantics

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

Page 28: I conference2015 goble-finalupload

httpwwwresearchobjectorg

First class citizens - data software methods - id manage credit track profile focus

A Framework to Bundle and Link (scattered) resources related experiments Metadata Objects that carry Research Context

Research Objects

Bigger on the inside than the outside

Contentbull closed lt-gt open bull local lt-gt alienbull embed lt-gt referbull fixed lt-gt fluidbull nestedbull cite resolve steward

Contributionsbull multi ndashtyped stewarded

sited authoredbull span research researchers

platforms timebull cite resolve steward

Identity + Minimal Provenance

RO Resolution and Citation

rsaquo Defend it (snapshot)

rsaquo Locate it (most recent)

rsaquo Reuse it (a version a component)

rsaquo Credit it (contributory authorship)

rsaquo Cross link it (connections)

Biological Study Records (eg PRIDE) stableBiological Knowledge (eg UNIPROT) evolving

Goble De Roure Bechhofer Accelerating Knowledge Turns I3CK 2013

means

ends

driver

Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data Farr Research Object Commons

STELAR Asthma e-Lab Study Team for Early Life Asthma Research

Platform exchange ClinicalCodesorg coded patient cohorts exchange with NHS FARSITE system

STELAR e-Lab

Platform 1

Platform 2

Platform 3

A multi-site collaboration to support safe use of patient and research data for medical research

Research Object CurrencyCohort Studies

Focus on methods models workflows scripts software data figureshellip

Research Object Pivots and Profiles

Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation

R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482

Simply data + code

Can change the definition of

a figure and ultimately the

journal article

Colomb J and Brembs B

Sub-strains of Drosophila Canton-S differ

markedly in their locomotor behavior [v1

ref status indexed httpf1000res3is]

F1000Research 2014 3176

Other labs can replicate the study or

contribute their data to a meta-

analysis or disease model - figure

automatically updates

Data updates time-stamped

New conclusions added via versions

Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012

Software-like Release paradigm Not a static document paradigm

Reproduce looks backwards -gt Release looks forwards

raquo Science methods data change -gt agile evolution

raquo Comparisons versions forks amp merges dependencies

raquo Id amp Citations

raquo Interlinked ROs

[McEntyre]

Retrospective Release Research Object

The ROs Meme

recompute

replicatererun

repeat

re-examine

repurpose

recreate

reuse

restorereconstruct review

regeneraterevise

recycle

redo

What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo

ldquoshow A is true by doing Brdquo

verify but not falsify[Yong Nature 485 2012]

robustness tolerance

verification compliance

validation assurance

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

1 Science Changes So does the Lab

ldquoThe questions donrsquot

change but the

answers dordquoDan Reed

The lab is not fixedUpdated resources

UncertaintyBioSTIF

Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012

2 Instruments Break Labs Decaymaterials become unavailable technicians leave

Reproducibility Window

raquo Bit rot Black boxes

raquo Proprietary Licenses

raquo Clown services

raquo Partial replication

raquo Prepare to Repair

rsaquo form or function

rsaquo preserve or sustain

Jason Scott

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Methods(techniques algorithms

spec of the steps)

Materials(datasets parameters

algorithm seeds)

Experiment

Instruments(codes services scripts

underlying libraries)

Laboratory(sw and hw infrastructure

systems software

integrative platforms)

Setup

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Research Environment

submit articleand move onhellip

publish articlePublication Environment

Research Environment

publish articlePublication Environment

submit articleand move onhellip

[Adapted Freire 2013]

transparencydependencies

steps featuresprovenance trace

portability

robustness

preservation

accessavailable

descriptionintelligible

standardscommon APIs

licensing

standardscommon

metadata

change managementversioning

packaging

Machine actionable

Machine actionable

Reproducibility Framework

submit articleand move onhellip

Reporting

Documentation

Provenance ndashThick Trace Data

to Distilled Reporting

Distillation and

Summarisation

Alper P et al LabelFlow Exploiting Workflow Provenance to Surface Scientific Data Provenance IPAW 2014 84-96

Reproduce by ReadingArchived Record Retain the ProcessCode

The IT Crowd Series 3 Episode 4

The eLab Virtual Machine (or Docker Image)

a black box thoughdockercom

Reproduce by Running Active InstrumentRetain the bits

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

Shared Repository

Personal Notebook

Community Registry

Publishing Resource

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

standardsAdobe

UCFORE PROVODF

formats

api

Instrument

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Instrument

NISO-JATS

Instrument

J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013

Platform profiles

NISO-JATS

Instrument

Container

Manifest

OMEX archive

httpsresearchobjectgithubiospecificationsbundle

Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369

Retro-Fitted ROsusing off the shelf

platforms

Method Matters

Reproducibility Smarts

Commons not Repository

Research Tardis

Retro-fit ROs

Do As Little As Possible

Make -gt Born

Native RO platforms

RARE amp FAIR Knowledge Turns Means Research Objects

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researchers

Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

PI Team

RARE Research Reality Check

RARE Research Reality Check

Tribal Behaviour

raquo Gangs share but not with the public

raquo Tribal behaviours rsaquo Modellers share more than Experimentalists

rsaquo Experimentalists reuse models more than Modellers

raquo Trading behavioursrsaquo Collaboration ndash complementarity

correlations

raquo Structured consortia less likely to publicly share than individuals

raquo Post-hoc rationalised DataModel Cycles

[Garza 2014]

raquo Fluid transient collaborations gt ldquomy gangrdquo management

raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity

raquo Class captains (prefects)

raquo Get the cool kids on board

raquo Head teacher leadership

[Garza 2014]

Playground Rules

Trace Data

27032015 74

me

ME

my team

closecolleagues

peers

The Research Release Creep Spiral

raquo Data Hugging amp Flirting

raquo Reciprocity norms

raquo Hans W request

raquo Dowry phenomenon

raquo Private installations

raquo Private spaces on shared installations

raquo Safe havens

Too ugly to show anyone else

Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it

The code is too sophisticated for most readersrefereesI didnt work out all the details

I didnt actually write the code -- my student did

My competitors would be unfair to me

Its valuable intellectual property

It would make papers much longer

Referees would never agree to check the code

My code invokes other code with unpublished (proprietary) code

Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011

Drivers

love money

fame duty

fear timeeffort

shame duty

[Apologies to Resnick and Malone]

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Interface Framingraquo Limited scheduled sharing choices

rsaquo Never say never

raquo ldquoCitablerdquo not ldquoSharedrdquo

raquo Feedback

rsaquo Guilt tripping

rsaquo Outlier finger pointing

[Garzia]

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipResearch Currencies

ldquoResearchBitCoinrdquo

Citation Semantics

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

Page 29: I conference2015 goble-finalupload

Bigger on the inside than the outside

Contentbull closed lt-gt open bull local lt-gt alienbull embed lt-gt referbull fixed lt-gt fluidbull nestedbull cite resolve steward

Contributionsbull multi ndashtyped stewarded

sited authoredbull span research researchers

platforms timebull cite resolve steward

Identity + Minimal Provenance

RO Resolution and Citation

rsaquo Defend it (snapshot)

rsaquo Locate it (most recent)

rsaquo Reuse it (a version a component)

rsaquo Credit it (contributory authorship)

rsaquo Cross link it (connections)

Biological Study Records (eg PRIDE) stableBiological Knowledge (eg UNIPROT) evolving

Goble De Roure Bechhofer Accelerating Knowledge Turns I3CK 2013

means

ends

driver

Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data Farr Research Object Commons

STELAR Asthma e-Lab Study Team for Early Life Asthma Research

Platform exchange ClinicalCodesorg coded patient cohorts exchange with NHS FARSITE system

STELAR e-Lab

Platform 1

Platform 2

Platform 3

A multi-site collaboration to support safe use of patient and research data for medical research

Research Object CurrencyCohort Studies

Focus on methods models workflows scripts software data figureshellip

Research Object Pivots and Profiles

Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation

R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482

Simply data + code

Can change the definition of

a figure and ultimately the

journal article

Colomb J and Brembs B

Sub-strains of Drosophila Canton-S differ

markedly in their locomotor behavior [v1

ref status indexed httpf1000res3is]

F1000Research 2014 3176

Other labs can replicate the study or

contribute their data to a meta-

analysis or disease model - figure

automatically updates

Data updates time-stamped

New conclusions added via versions

Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012

Software-like Release paradigm Not a static document paradigm

Reproduce looks backwards -gt Release looks forwards

raquo Science methods data change -gt agile evolution

raquo Comparisons versions forks amp merges dependencies

raquo Id amp Citations

raquo Interlinked ROs

[McEntyre]

Retrospective Release Research Object

The ROs Meme

recompute

replicatererun

repeat

re-examine

repurpose

recreate

reuse

restorereconstruct review

regeneraterevise

recycle

redo

What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo

ldquoshow A is true by doing Brdquo

verify but not falsify[Yong Nature 485 2012]

robustness tolerance

verification compliance

validation assurance

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

1 Science Changes So does the Lab

ldquoThe questions donrsquot

change but the

answers dordquoDan Reed

The lab is not fixedUpdated resources

UncertaintyBioSTIF

Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012

2 Instruments Break Labs Decaymaterials become unavailable technicians leave

Reproducibility Window

raquo Bit rot Black boxes

raquo Proprietary Licenses

raquo Clown services

raquo Partial replication

raquo Prepare to Repair

rsaquo form or function

rsaquo preserve or sustain

Jason Scott

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Methods(techniques algorithms

spec of the steps)

Materials(datasets parameters

algorithm seeds)

Experiment

Instruments(codes services scripts

underlying libraries)

Laboratory(sw and hw infrastructure

systems software

integrative platforms)

Setup

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Research Environment

submit articleand move onhellip

publish articlePublication Environment

Research Environment

publish articlePublication Environment

submit articleand move onhellip

[Adapted Freire 2013]

transparencydependencies

steps featuresprovenance trace

portability

robustness

preservation

accessavailable

descriptionintelligible

standardscommon APIs

licensing

standardscommon

metadata

change managementversioning

packaging

Machine actionable

Machine actionable

Reproducibility Framework

submit articleand move onhellip

Reporting

Documentation

Provenance ndashThick Trace Data

to Distilled Reporting

Distillation and

Summarisation

Alper P et al LabelFlow Exploiting Workflow Provenance to Surface Scientific Data Provenance IPAW 2014 84-96

Reproduce by ReadingArchived Record Retain the ProcessCode

The IT Crowd Series 3 Episode 4

The eLab Virtual Machine (or Docker Image)

a black box thoughdockercom

Reproduce by Running Active InstrumentRetain the bits

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

Shared Repository

Personal Notebook

Community Registry

Publishing Resource

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

standardsAdobe

UCFORE PROVODF

formats

api

Instrument

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Instrument

NISO-JATS

Instrument

J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013

Platform profiles

NISO-JATS

Instrument

Container

Manifest

OMEX archive

httpsresearchobjectgithubiospecificationsbundle

Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369

Retro-Fitted ROsusing off the shelf

platforms

Method Matters

Reproducibility Smarts

Commons not Repository

Research Tardis

Retro-fit ROs

Do As Little As Possible

Make -gt Born

Native RO platforms

RARE amp FAIR Knowledge Turns Means Research Objects

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researchers

Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

PI Team

RARE Research Reality Check

RARE Research Reality Check

Tribal Behaviour

raquo Gangs share but not with the public

raquo Tribal behaviours rsaquo Modellers share more than Experimentalists

rsaquo Experimentalists reuse models more than Modellers

raquo Trading behavioursrsaquo Collaboration ndash complementarity

correlations

raquo Structured consortia less likely to publicly share than individuals

raquo Post-hoc rationalised DataModel Cycles

[Garza 2014]

raquo Fluid transient collaborations gt ldquomy gangrdquo management

raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity

raquo Class captains (prefects)

raquo Get the cool kids on board

raquo Head teacher leadership

[Garza 2014]

Playground Rules

Trace Data

27032015 74

me

ME

my team

closecolleagues

peers

The Research Release Creep Spiral

raquo Data Hugging amp Flirting

raquo Reciprocity norms

raquo Hans W request

raquo Dowry phenomenon

raquo Private installations

raquo Private spaces on shared installations

raquo Safe havens

Too ugly to show anyone else

Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it

The code is too sophisticated for most readersrefereesI didnt work out all the details

I didnt actually write the code -- my student did

My competitors would be unfair to me

Its valuable intellectual property

It would make papers much longer

Referees would never agree to check the code

My code invokes other code with unpublished (proprietary) code

Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011

Drivers

love money

fame duty

fear timeeffort

shame duty

[Apologies to Resnick and Malone]

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Interface Framingraquo Limited scheduled sharing choices

rsaquo Never say never

raquo ldquoCitablerdquo not ldquoSharedrdquo

raquo Feedback

rsaquo Guilt tripping

rsaquo Outlier finger pointing

[Garzia]

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipResearch Currencies

ldquoResearchBitCoinrdquo

Citation Semantics

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

Page 30: I conference2015 goble-finalupload

Identity + Minimal Provenance

RO Resolution and Citation

rsaquo Defend it (snapshot)

rsaquo Locate it (most recent)

rsaquo Reuse it (a version a component)

rsaquo Credit it (contributory authorship)

rsaquo Cross link it (connections)

Biological Study Records (eg PRIDE) stableBiological Knowledge (eg UNIPROT) evolving

Goble De Roure Bechhofer Accelerating Knowledge Turns I3CK 2013

means

ends

driver

Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data Farr Research Object Commons

STELAR Asthma e-Lab Study Team for Early Life Asthma Research

Platform exchange ClinicalCodesorg coded patient cohorts exchange with NHS FARSITE system

STELAR e-Lab

Platform 1

Platform 2

Platform 3

A multi-site collaboration to support safe use of patient and research data for medical research

Research Object CurrencyCohort Studies

Focus on methods models workflows scripts software data figureshellip

Research Object Pivots and Profiles

Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation

R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482

Simply data + code

Can change the definition of

a figure and ultimately the

journal article

Colomb J and Brembs B

Sub-strains of Drosophila Canton-S differ

markedly in their locomotor behavior [v1

ref status indexed httpf1000res3is]

F1000Research 2014 3176

Other labs can replicate the study or

contribute their data to a meta-

analysis or disease model - figure

automatically updates

Data updates time-stamped

New conclusions added via versions

Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012

Software-like Release paradigm Not a static document paradigm

Reproduce looks backwards -gt Release looks forwards

raquo Science methods data change -gt agile evolution

raquo Comparisons versions forks amp merges dependencies

raquo Id amp Citations

raquo Interlinked ROs

[McEntyre]

Retrospective Release Research Object

The ROs Meme

recompute

replicatererun

repeat

re-examine

repurpose

recreate

reuse

restorereconstruct review

regeneraterevise

recycle

redo

What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo

ldquoshow A is true by doing Brdquo

verify but not falsify[Yong Nature 485 2012]

robustness tolerance

verification compliance

validation assurance

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

1 Science Changes So does the Lab

ldquoThe questions donrsquot

change but the

answers dordquoDan Reed

The lab is not fixedUpdated resources

UncertaintyBioSTIF

Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012

2 Instruments Break Labs Decaymaterials become unavailable technicians leave

Reproducibility Window

raquo Bit rot Black boxes

raquo Proprietary Licenses

raquo Clown services

raquo Partial replication

raquo Prepare to Repair

rsaquo form or function

rsaquo preserve or sustain

Jason Scott

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Methods(techniques algorithms

spec of the steps)

Materials(datasets parameters

algorithm seeds)

Experiment

Instruments(codes services scripts

underlying libraries)

Laboratory(sw and hw infrastructure

systems software

integrative platforms)

Setup

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Research Environment

submit articleand move onhellip

publish articlePublication Environment

Research Environment

publish articlePublication Environment

submit articleand move onhellip

[Adapted Freire 2013]

transparencydependencies

steps featuresprovenance trace

portability

robustness

preservation

accessavailable

descriptionintelligible

standardscommon APIs

licensing

standardscommon

metadata

change managementversioning

packaging

Machine actionable

Machine actionable

Reproducibility Framework

submit articleand move onhellip

Reporting

Documentation

Provenance ndashThick Trace Data

to Distilled Reporting

Distillation and

Summarisation

Alper P et al LabelFlow Exploiting Workflow Provenance to Surface Scientific Data Provenance IPAW 2014 84-96

Reproduce by ReadingArchived Record Retain the ProcessCode

The IT Crowd Series 3 Episode 4

The eLab Virtual Machine (or Docker Image)

a black box thoughdockercom

Reproduce by Running Active InstrumentRetain the bits

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

Shared Repository

Personal Notebook

Community Registry

Publishing Resource

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

standardsAdobe

UCFORE PROVODF

formats

api

Instrument

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Instrument

NISO-JATS

Instrument

J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013

Platform profiles

NISO-JATS

Instrument

Container

Manifest

OMEX archive

httpsresearchobjectgithubiospecificationsbundle

Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369

Retro-Fitted ROsusing off the shelf

platforms

Method Matters

Reproducibility Smarts

Commons not Repository

Research Tardis

Retro-fit ROs

Do As Little As Possible

Make -gt Born

Native RO platforms

RARE amp FAIR Knowledge Turns Means Research Objects

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researchers

Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

PI Team

RARE Research Reality Check

RARE Research Reality Check

Tribal Behaviour

raquo Gangs share but not with the public

raquo Tribal behaviours rsaquo Modellers share more than Experimentalists

rsaquo Experimentalists reuse models more than Modellers

raquo Trading behavioursrsaquo Collaboration ndash complementarity

correlations

raquo Structured consortia less likely to publicly share than individuals

raquo Post-hoc rationalised DataModel Cycles

[Garza 2014]

raquo Fluid transient collaborations gt ldquomy gangrdquo management

raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity

raquo Class captains (prefects)

raquo Get the cool kids on board

raquo Head teacher leadership

[Garza 2014]

Playground Rules

Trace Data

27032015 74

me

ME

my team

closecolleagues

peers

The Research Release Creep Spiral

raquo Data Hugging amp Flirting

raquo Reciprocity norms

raquo Hans W request

raquo Dowry phenomenon

raquo Private installations

raquo Private spaces on shared installations

raquo Safe havens

Too ugly to show anyone else

Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it

The code is too sophisticated for most readersrefereesI didnt work out all the details

I didnt actually write the code -- my student did

My competitors would be unfair to me

Its valuable intellectual property

It would make papers much longer

Referees would never agree to check the code

My code invokes other code with unpublished (proprietary) code

Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011

Drivers

love money

fame duty

fear timeeffort

shame duty

[Apologies to Resnick and Malone]

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Interface Framingraquo Limited scheduled sharing choices

rsaquo Never say never

raquo ldquoCitablerdquo not ldquoSharedrdquo

raquo Feedback

rsaquo Guilt tripping

rsaquo Outlier finger pointing

[Garzia]

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipResearch Currencies

ldquoResearchBitCoinrdquo

Citation Semantics

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

Page 31: I conference2015 goble-finalupload

Goble De Roure Bechhofer Accelerating Knowledge Turns I3CK 2013

means

ends

driver

Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data Farr Research Object Commons

STELAR Asthma e-Lab Study Team for Early Life Asthma Research

Platform exchange ClinicalCodesorg coded patient cohorts exchange with NHS FARSITE system

STELAR e-Lab

Platform 1

Platform 2

Platform 3

A multi-site collaboration to support safe use of patient and research data for medical research

Research Object CurrencyCohort Studies

Focus on methods models workflows scripts software data figureshellip

Research Object Pivots and Profiles

Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation

R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482

Simply data + code

Can change the definition of

a figure and ultimately the

journal article

Colomb J and Brembs B

Sub-strains of Drosophila Canton-S differ

markedly in their locomotor behavior [v1

ref status indexed httpf1000res3is]

F1000Research 2014 3176

Other labs can replicate the study or

contribute their data to a meta-

analysis or disease model - figure

automatically updates

Data updates time-stamped

New conclusions added via versions

Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012

Software-like Release paradigm Not a static document paradigm

Reproduce looks backwards -gt Release looks forwards

raquo Science methods data change -gt agile evolution

raquo Comparisons versions forks amp merges dependencies

raquo Id amp Citations

raquo Interlinked ROs

[McEntyre]

Retrospective Release Research Object

The ROs Meme

recompute

replicatererun

repeat

re-examine

repurpose

recreate

reuse

restorereconstruct review

regeneraterevise

recycle

redo

What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo

ldquoshow A is true by doing Brdquo

verify but not falsify[Yong Nature 485 2012]

robustness tolerance

verification compliance

validation assurance

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

1 Science Changes So does the Lab

ldquoThe questions donrsquot

change but the

answers dordquoDan Reed

The lab is not fixedUpdated resources

UncertaintyBioSTIF

Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012

2 Instruments Break Labs Decaymaterials become unavailable technicians leave

Reproducibility Window

raquo Bit rot Black boxes

raquo Proprietary Licenses

raquo Clown services

raquo Partial replication

raquo Prepare to Repair

rsaquo form or function

rsaquo preserve or sustain

Jason Scott

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Methods(techniques algorithms

spec of the steps)

Materials(datasets parameters

algorithm seeds)

Experiment

Instruments(codes services scripts

underlying libraries)

Laboratory(sw and hw infrastructure

systems software

integrative platforms)

Setup

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Research Environment

submit articleand move onhellip

publish articlePublication Environment

Research Environment

publish articlePublication Environment

submit articleand move onhellip

[Adapted Freire 2013]

transparencydependencies

steps featuresprovenance trace

portability

robustness

preservation

accessavailable

descriptionintelligible

standardscommon APIs

licensing

standardscommon

metadata

change managementversioning

packaging

Machine actionable

Machine actionable

Reproducibility Framework

submit articleand move onhellip

Reporting

Documentation

Provenance ndashThick Trace Data

to Distilled Reporting

Distillation and

Summarisation

Alper P et al LabelFlow Exploiting Workflow Provenance to Surface Scientific Data Provenance IPAW 2014 84-96

Reproduce by ReadingArchived Record Retain the ProcessCode

The IT Crowd Series 3 Episode 4

The eLab Virtual Machine (or Docker Image)

a black box thoughdockercom

Reproduce by Running Active InstrumentRetain the bits

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

Shared Repository

Personal Notebook

Community Registry

Publishing Resource

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

standardsAdobe

UCFORE PROVODF

formats

api

Instrument

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Instrument

NISO-JATS

Instrument

J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013

Platform profiles

NISO-JATS

Instrument

Container

Manifest

OMEX archive

httpsresearchobjectgithubiospecificationsbundle

Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369

Retro-Fitted ROsusing off the shelf

platforms

Method Matters

Reproducibility Smarts

Commons not Repository

Research Tardis

Retro-fit ROs

Do As Little As Possible

Make -gt Born

Native RO platforms

RARE amp FAIR Knowledge Turns Means Research Objects

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researchers

Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

PI Team

RARE Research Reality Check

RARE Research Reality Check

Tribal Behaviour

raquo Gangs share but not with the public

raquo Tribal behaviours rsaquo Modellers share more than Experimentalists

rsaquo Experimentalists reuse models more than Modellers

raquo Trading behavioursrsaquo Collaboration ndash complementarity

correlations

raquo Structured consortia less likely to publicly share than individuals

raquo Post-hoc rationalised DataModel Cycles

[Garza 2014]

raquo Fluid transient collaborations gt ldquomy gangrdquo management

raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity

raquo Class captains (prefects)

raquo Get the cool kids on board

raquo Head teacher leadership

[Garza 2014]

Playground Rules

Trace Data

27032015 74

me

ME

my team

closecolleagues

peers

The Research Release Creep Spiral

raquo Data Hugging amp Flirting

raquo Reciprocity norms

raquo Hans W request

raquo Dowry phenomenon

raquo Private installations

raquo Private spaces on shared installations

raquo Safe havens

Too ugly to show anyone else

Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it

The code is too sophisticated for most readersrefereesI didnt work out all the details

I didnt actually write the code -- my student did

My competitors would be unfair to me

Its valuable intellectual property

It would make papers much longer

Referees would never agree to check the code

My code invokes other code with unpublished (proprietary) code

Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011

Drivers

love money

fame duty

fear timeeffort

shame duty

[Apologies to Resnick and Malone]

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Interface Framingraquo Limited scheduled sharing choices

rsaquo Never say never

raquo ldquoCitablerdquo not ldquoSharedrdquo

raquo Feedback

rsaquo Guilt tripping

rsaquo Outlier finger pointing

[Garzia]

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipResearch Currencies

ldquoResearchBitCoinrdquo

Citation Semantics

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

Page 32: I conference2015 goble-finalupload

Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data Farr Research Object Commons

STELAR Asthma e-Lab Study Team for Early Life Asthma Research

Platform exchange ClinicalCodesorg coded patient cohorts exchange with NHS FARSITE system

STELAR e-Lab

Platform 1

Platform 2

Platform 3

A multi-site collaboration to support safe use of patient and research data for medical research

Research Object CurrencyCohort Studies

Focus on methods models workflows scripts software data figureshellip

Research Object Pivots and Profiles

Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation

R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482

Simply data + code

Can change the definition of

a figure and ultimately the

journal article

Colomb J and Brembs B

Sub-strains of Drosophila Canton-S differ

markedly in their locomotor behavior [v1

ref status indexed httpf1000res3is]

F1000Research 2014 3176

Other labs can replicate the study or

contribute their data to a meta-

analysis or disease model - figure

automatically updates

Data updates time-stamped

New conclusions added via versions

Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012

Software-like Release paradigm Not a static document paradigm

Reproduce looks backwards -gt Release looks forwards

raquo Science methods data change -gt agile evolution

raquo Comparisons versions forks amp merges dependencies

raquo Id amp Citations

raquo Interlinked ROs

[McEntyre]

Retrospective Release Research Object

The ROs Meme

recompute

replicatererun

repeat

re-examine

repurpose

recreate

reuse

restorereconstruct review

regeneraterevise

recycle

redo

What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo

ldquoshow A is true by doing Brdquo

verify but not falsify[Yong Nature 485 2012]

robustness tolerance

verification compliance

validation assurance

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

1 Science Changes So does the Lab

ldquoThe questions donrsquot

change but the

answers dordquoDan Reed

The lab is not fixedUpdated resources

UncertaintyBioSTIF

Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012

2 Instruments Break Labs Decaymaterials become unavailable technicians leave

Reproducibility Window

raquo Bit rot Black boxes

raquo Proprietary Licenses

raquo Clown services

raquo Partial replication

raquo Prepare to Repair

rsaquo form or function

rsaquo preserve or sustain

Jason Scott

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Methods(techniques algorithms

spec of the steps)

Materials(datasets parameters

algorithm seeds)

Experiment

Instruments(codes services scripts

underlying libraries)

Laboratory(sw and hw infrastructure

systems software

integrative platforms)

Setup

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Research Environment

submit articleand move onhellip

publish articlePublication Environment

Research Environment

publish articlePublication Environment

submit articleand move onhellip

[Adapted Freire 2013]

transparencydependencies

steps featuresprovenance trace

portability

robustness

preservation

accessavailable

descriptionintelligible

standardscommon APIs

licensing

standardscommon

metadata

change managementversioning

packaging

Machine actionable

Machine actionable

Reproducibility Framework

submit articleand move onhellip

Reporting

Documentation

Provenance ndashThick Trace Data

to Distilled Reporting

Distillation and

Summarisation

Alper P et al LabelFlow Exploiting Workflow Provenance to Surface Scientific Data Provenance IPAW 2014 84-96

Reproduce by ReadingArchived Record Retain the ProcessCode

The IT Crowd Series 3 Episode 4

The eLab Virtual Machine (or Docker Image)

a black box thoughdockercom

Reproduce by Running Active InstrumentRetain the bits

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

Shared Repository

Personal Notebook

Community Registry

Publishing Resource

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

standardsAdobe

UCFORE PROVODF

formats

api

Instrument

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Instrument

NISO-JATS

Instrument

J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013

Platform profiles

NISO-JATS

Instrument

Container

Manifest

OMEX archive

httpsresearchobjectgithubiospecificationsbundle

Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369

Retro-Fitted ROsusing off the shelf

platforms

Method Matters

Reproducibility Smarts

Commons not Repository

Research Tardis

Retro-fit ROs

Do As Little As Possible

Make -gt Born

Native RO platforms

RARE amp FAIR Knowledge Turns Means Research Objects

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researchers

Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

PI Team

RARE Research Reality Check

RARE Research Reality Check

Tribal Behaviour

raquo Gangs share but not with the public

raquo Tribal behaviours rsaquo Modellers share more than Experimentalists

rsaquo Experimentalists reuse models more than Modellers

raquo Trading behavioursrsaquo Collaboration ndash complementarity

correlations

raquo Structured consortia less likely to publicly share than individuals

raquo Post-hoc rationalised DataModel Cycles

[Garza 2014]

raquo Fluid transient collaborations gt ldquomy gangrdquo management

raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity

raquo Class captains (prefects)

raquo Get the cool kids on board

raquo Head teacher leadership

[Garza 2014]

Playground Rules

Trace Data

27032015 74

me

ME

my team

closecolleagues

peers

The Research Release Creep Spiral

raquo Data Hugging amp Flirting

raquo Reciprocity norms

raquo Hans W request

raquo Dowry phenomenon

raquo Private installations

raquo Private spaces on shared installations

raquo Safe havens

Too ugly to show anyone else

Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it

The code is too sophisticated for most readersrefereesI didnt work out all the details

I didnt actually write the code -- my student did

My competitors would be unfair to me

Its valuable intellectual property

It would make papers much longer

Referees would never agree to check the code

My code invokes other code with unpublished (proprietary) code

Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011

Drivers

love money

fame duty

fear timeeffort

shame duty

[Apologies to Resnick and Malone]

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Interface Framingraquo Limited scheduled sharing choices

rsaquo Never say never

raquo ldquoCitablerdquo not ldquoSharedrdquo

raquo Feedback

rsaquo Guilt tripping

rsaquo Outlier finger pointing

[Garzia]

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipResearch Currencies

ldquoResearchBitCoinrdquo

Citation Semantics

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

Page 33: I conference2015 goble-finalupload

Focus on methods models workflows scripts software data figureshellip

Research Object Pivots and Profiles

Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation

R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482

Simply data + code

Can change the definition of

a figure and ultimately the

journal article

Colomb J and Brembs B

Sub-strains of Drosophila Canton-S differ

markedly in their locomotor behavior [v1

ref status indexed httpf1000res3is]

F1000Research 2014 3176

Other labs can replicate the study or

contribute their data to a meta-

analysis or disease model - figure

automatically updates

Data updates time-stamped

New conclusions added via versions

Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012

Software-like Release paradigm Not a static document paradigm

Reproduce looks backwards -gt Release looks forwards

raquo Science methods data change -gt agile evolution

raquo Comparisons versions forks amp merges dependencies

raquo Id amp Citations

raquo Interlinked ROs

[McEntyre]

Retrospective Release Research Object

The ROs Meme

recompute

replicatererun

repeat

re-examine

repurpose

recreate

reuse

restorereconstruct review

regeneraterevise

recycle

redo

What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo

ldquoshow A is true by doing Brdquo

verify but not falsify[Yong Nature 485 2012]

robustness tolerance

verification compliance

validation assurance

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

1 Science Changes So does the Lab

ldquoThe questions donrsquot

change but the

answers dordquoDan Reed

The lab is not fixedUpdated resources

UncertaintyBioSTIF

Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012

2 Instruments Break Labs Decaymaterials become unavailable technicians leave

Reproducibility Window

raquo Bit rot Black boxes

raquo Proprietary Licenses

raquo Clown services

raquo Partial replication

raquo Prepare to Repair

rsaquo form or function

rsaquo preserve or sustain

Jason Scott

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Methods(techniques algorithms

spec of the steps)

Materials(datasets parameters

algorithm seeds)

Experiment

Instruments(codes services scripts

underlying libraries)

Laboratory(sw and hw infrastructure

systems software

integrative platforms)

Setup

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Research Environment

submit articleand move onhellip

publish articlePublication Environment

Research Environment

publish articlePublication Environment

submit articleand move onhellip

[Adapted Freire 2013]

transparencydependencies

steps featuresprovenance trace

portability

robustness

preservation

accessavailable

descriptionintelligible

standardscommon APIs

licensing

standardscommon

metadata

change managementversioning

packaging

Machine actionable

Machine actionable

Reproducibility Framework

submit articleand move onhellip

Reporting

Documentation

Provenance ndashThick Trace Data

to Distilled Reporting

Distillation and

Summarisation

Alper P et al LabelFlow Exploiting Workflow Provenance to Surface Scientific Data Provenance IPAW 2014 84-96

Reproduce by ReadingArchived Record Retain the ProcessCode

The IT Crowd Series 3 Episode 4

The eLab Virtual Machine (or Docker Image)

a black box thoughdockercom

Reproduce by Running Active InstrumentRetain the bits

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

Shared Repository

Personal Notebook

Community Registry

Publishing Resource

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

standardsAdobe

UCFORE PROVODF

formats

api

Instrument

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Instrument

NISO-JATS

Instrument

J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013

Platform profiles

NISO-JATS

Instrument

Container

Manifest

OMEX archive

httpsresearchobjectgithubiospecificationsbundle

Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369

Retro-Fitted ROsusing off the shelf

platforms

Method Matters

Reproducibility Smarts

Commons not Repository

Research Tardis

Retro-fit ROs

Do As Little As Possible

Make -gt Born

Native RO platforms

RARE amp FAIR Knowledge Turns Means Research Objects

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researchers

Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

PI Team

RARE Research Reality Check

RARE Research Reality Check

Tribal Behaviour

raquo Gangs share but not with the public

raquo Tribal behaviours rsaquo Modellers share more than Experimentalists

rsaquo Experimentalists reuse models more than Modellers

raquo Trading behavioursrsaquo Collaboration ndash complementarity

correlations

raquo Structured consortia less likely to publicly share than individuals

raquo Post-hoc rationalised DataModel Cycles

[Garza 2014]

raquo Fluid transient collaborations gt ldquomy gangrdquo management

raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity

raquo Class captains (prefects)

raquo Get the cool kids on board

raquo Head teacher leadership

[Garza 2014]

Playground Rules

Trace Data

27032015 74

me

ME

my team

closecolleagues

peers

The Research Release Creep Spiral

raquo Data Hugging amp Flirting

raquo Reciprocity norms

raquo Hans W request

raquo Dowry phenomenon

raquo Private installations

raquo Private spaces on shared installations

raquo Safe havens

Too ugly to show anyone else

Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it

The code is too sophisticated for most readersrefereesI didnt work out all the details

I didnt actually write the code -- my student did

My competitors would be unfair to me

Its valuable intellectual property

It would make papers much longer

Referees would never agree to check the code

My code invokes other code with unpublished (proprietary) code

Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011

Drivers

love money

fame duty

fear timeeffort

shame duty

[Apologies to Resnick and Malone]

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Interface Framingraquo Limited scheduled sharing choices

rsaquo Never say never

raquo ldquoCitablerdquo not ldquoSharedrdquo

raquo Feedback

rsaquo Guilt tripping

rsaquo Outlier finger pointing

[Garzia]

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipResearch Currencies

ldquoResearchBitCoinrdquo

Citation Semantics

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

Page 34: I conference2015 goble-finalupload

Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation

R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482

Simply data + code

Can change the definition of

a figure and ultimately the

journal article

Colomb J and Brembs B

Sub-strains of Drosophila Canton-S differ

markedly in their locomotor behavior [v1

ref status indexed httpf1000res3is]

F1000Research 2014 3176

Other labs can replicate the study or

contribute their data to a meta-

analysis or disease model - figure

automatically updates

Data updates time-stamped

New conclusions added via versions

Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012

Software-like Release paradigm Not a static document paradigm

Reproduce looks backwards -gt Release looks forwards

raquo Science methods data change -gt agile evolution

raquo Comparisons versions forks amp merges dependencies

raquo Id amp Citations

raquo Interlinked ROs

[McEntyre]

Retrospective Release Research Object

The ROs Meme

recompute

replicatererun

repeat

re-examine

repurpose

recreate

reuse

restorereconstruct review

regeneraterevise

recycle

redo

What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo

ldquoshow A is true by doing Brdquo

verify but not falsify[Yong Nature 485 2012]

robustness tolerance

verification compliance

validation assurance

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

1 Science Changes So does the Lab

ldquoThe questions donrsquot

change but the

answers dordquoDan Reed

The lab is not fixedUpdated resources

UncertaintyBioSTIF

Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012

2 Instruments Break Labs Decaymaterials become unavailable technicians leave

Reproducibility Window

raquo Bit rot Black boxes

raquo Proprietary Licenses

raquo Clown services

raquo Partial replication

raquo Prepare to Repair

rsaquo form or function

rsaquo preserve or sustain

Jason Scott

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Methods(techniques algorithms

spec of the steps)

Materials(datasets parameters

algorithm seeds)

Experiment

Instruments(codes services scripts

underlying libraries)

Laboratory(sw and hw infrastructure

systems software

integrative platforms)

Setup

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Research Environment

submit articleand move onhellip

publish articlePublication Environment

Research Environment

publish articlePublication Environment

submit articleand move onhellip

[Adapted Freire 2013]

transparencydependencies

steps featuresprovenance trace

portability

robustness

preservation

accessavailable

descriptionintelligible

standardscommon APIs

licensing

standardscommon

metadata

change managementversioning

packaging

Machine actionable

Machine actionable

Reproducibility Framework

submit articleand move onhellip

Reporting

Documentation

Provenance ndashThick Trace Data

to Distilled Reporting

Distillation and

Summarisation

Alper P et al LabelFlow Exploiting Workflow Provenance to Surface Scientific Data Provenance IPAW 2014 84-96

Reproduce by ReadingArchived Record Retain the ProcessCode

The IT Crowd Series 3 Episode 4

The eLab Virtual Machine (or Docker Image)

a black box thoughdockercom

Reproduce by Running Active InstrumentRetain the bits

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

Shared Repository

Personal Notebook

Community Registry

Publishing Resource

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

standardsAdobe

UCFORE PROVODF

formats

api

Instrument

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Instrument

NISO-JATS

Instrument

J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013

Platform profiles

NISO-JATS

Instrument

Container

Manifest

OMEX archive

httpsresearchobjectgithubiospecificationsbundle

Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369

Retro-Fitted ROsusing off the shelf

platforms

Method Matters

Reproducibility Smarts

Commons not Repository

Research Tardis

Retro-fit ROs

Do As Little As Possible

Make -gt Born

Native RO platforms

RARE amp FAIR Knowledge Turns Means Research Objects

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researchers

Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

PI Team

RARE Research Reality Check

RARE Research Reality Check

Tribal Behaviour

raquo Gangs share but not with the public

raquo Tribal behaviours rsaquo Modellers share more than Experimentalists

rsaquo Experimentalists reuse models more than Modellers

raquo Trading behavioursrsaquo Collaboration ndash complementarity

correlations

raquo Structured consortia less likely to publicly share than individuals

raquo Post-hoc rationalised DataModel Cycles

[Garza 2014]

raquo Fluid transient collaborations gt ldquomy gangrdquo management

raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity

raquo Class captains (prefects)

raquo Get the cool kids on board

raquo Head teacher leadership

[Garza 2014]

Playground Rules

Trace Data

27032015 74

me

ME

my team

closecolleagues

peers

The Research Release Creep Spiral

raquo Data Hugging amp Flirting

raquo Reciprocity norms

raquo Hans W request

raquo Dowry phenomenon

raquo Private installations

raquo Private spaces on shared installations

raquo Safe havens

Too ugly to show anyone else

Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it

The code is too sophisticated for most readersrefereesI didnt work out all the details

I didnt actually write the code -- my student did

My competitors would be unfair to me

Its valuable intellectual property

It would make papers much longer

Referees would never agree to check the code

My code invokes other code with unpublished (proprietary) code

Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011

Drivers

love money

fame duty

fear timeeffort

shame duty

[Apologies to Resnick and Malone]

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Interface Framingraquo Limited scheduled sharing choices

rsaquo Never say never

raquo ldquoCitablerdquo not ldquoSharedrdquo

raquo Feedback

rsaquo Guilt tripping

rsaquo Outlier finger pointing

[Garzia]

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipResearch Currencies

ldquoResearchBitCoinrdquo

Citation Semantics

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

Page 35: I conference2015 goble-finalupload

Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012

Software-like Release paradigm Not a static document paradigm

Reproduce looks backwards -gt Release looks forwards

raquo Science methods data change -gt agile evolution

raquo Comparisons versions forks amp merges dependencies

raquo Id amp Citations

raquo Interlinked ROs

[McEntyre]

Retrospective Release Research Object

The ROs Meme

recompute

replicatererun

repeat

re-examine

repurpose

recreate

reuse

restorereconstruct review

regeneraterevise

recycle

redo

What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo

ldquoshow A is true by doing Brdquo

verify but not falsify[Yong Nature 485 2012]

robustness tolerance

verification compliance

validation assurance

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

1 Science Changes So does the Lab

ldquoThe questions donrsquot

change but the

answers dordquoDan Reed

The lab is not fixedUpdated resources

UncertaintyBioSTIF

Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012

2 Instruments Break Labs Decaymaterials become unavailable technicians leave

Reproducibility Window

raquo Bit rot Black boxes

raquo Proprietary Licenses

raquo Clown services

raquo Partial replication

raquo Prepare to Repair

rsaquo form or function

rsaquo preserve or sustain

Jason Scott

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Methods(techniques algorithms

spec of the steps)

Materials(datasets parameters

algorithm seeds)

Experiment

Instruments(codes services scripts

underlying libraries)

Laboratory(sw and hw infrastructure

systems software

integrative platforms)

Setup

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Research Environment

submit articleand move onhellip

publish articlePublication Environment

Research Environment

publish articlePublication Environment

submit articleand move onhellip

[Adapted Freire 2013]

transparencydependencies

steps featuresprovenance trace

portability

robustness

preservation

accessavailable

descriptionintelligible

standardscommon APIs

licensing

standardscommon

metadata

change managementversioning

packaging

Machine actionable

Machine actionable

Reproducibility Framework

submit articleand move onhellip

Reporting

Documentation

Provenance ndashThick Trace Data

to Distilled Reporting

Distillation and

Summarisation

Alper P et al LabelFlow Exploiting Workflow Provenance to Surface Scientific Data Provenance IPAW 2014 84-96

Reproduce by ReadingArchived Record Retain the ProcessCode

The IT Crowd Series 3 Episode 4

The eLab Virtual Machine (or Docker Image)

a black box thoughdockercom

Reproduce by Running Active InstrumentRetain the bits

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

Shared Repository

Personal Notebook

Community Registry

Publishing Resource

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

standardsAdobe

UCFORE PROVODF

formats

api

Instrument

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Instrument

NISO-JATS

Instrument

J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013

Platform profiles

NISO-JATS

Instrument

Container

Manifest

OMEX archive

httpsresearchobjectgithubiospecificationsbundle

Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369

Retro-Fitted ROsusing off the shelf

platforms

Method Matters

Reproducibility Smarts

Commons not Repository

Research Tardis

Retro-fit ROs

Do As Little As Possible

Make -gt Born

Native RO platforms

RARE amp FAIR Knowledge Turns Means Research Objects

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researchers

Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

PI Team

RARE Research Reality Check

RARE Research Reality Check

Tribal Behaviour

raquo Gangs share but not with the public

raquo Tribal behaviours rsaquo Modellers share more than Experimentalists

rsaquo Experimentalists reuse models more than Modellers

raquo Trading behavioursrsaquo Collaboration ndash complementarity

correlations

raquo Structured consortia less likely to publicly share than individuals

raquo Post-hoc rationalised DataModel Cycles

[Garza 2014]

raquo Fluid transient collaborations gt ldquomy gangrdquo management

raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity

raquo Class captains (prefects)

raquo Get the cool kids on board

raquo Head teacher leadership

[Garza 2014]

Playground Rules

Trace Data

27032015 74

me

ME

my team

closecolleagues

peers

The Research Release Creep Spiral

raquo Data Hugging amp Flirting

raquo Reciprocity norms

raquo Hans W request

raquo Dowry phenomenon

raquo Private installations

raquo Private spaces on shared installations

raquo Safe havens

Too ugly to show anyone else

Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it

The code is too sophisticated for most readersrefereesI didnt work out all the details

I didnt actually write the code -- my student did

My competitors would be unfair to me

Its valuable intellectual property

It would make papers much longer

Referees would never agree to check the code

My code invokes other code with unpublished (proprietary) code

Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011

Drivers

love money

fame duty

fear timeeffort

shame duty

[Apologies to Resnick and Malone]

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Interface Framingraquo Limited scheduled sharing choices

rsaquo Never say never

raquo ldquoCitablerdquo not ldquoSharedrdquo

raquo Feedback

rsaquo Guilt tripping

rsaquo Outlier finger pointing

[Garzia]

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipResearch Currencies

ldquoResearchBitCoinrdquo

Citation Semantics

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

Page 36: I conference2015 goble-finalupload

[McEntyre]

Retrospective Release Research Object

The ROs Meme

recompute

replicatererun

repeat

re-examine

repurpose

recreate

reuse

restorereconstruct review

regeneraterevise

recycle

redo

What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo

ldquoshow A is true by doing Brdquo

verify but not falsify[Yong Nature 485 2012]

robustness tolerance

verification compliance

validation assurance

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

1 Science Changes So does the Lab

ldquoThe questions donrsquot

change but the

answers dordquoDan Reed

The lab is not fixedUpdated resources

UncertaintyBioSTIF

Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012

2 Instruments Break Labs Decaymaterials become unavailable technicians leave

Reproducibility Window

raquo Bit rot Black boxes

raquo Proprietary Licenses

raquo Clown services

raquo Partial replication

raquo Prepare to Repair

rsaquo form or function

rsaquo preserve or sustain

Jason Scott

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Methods(techniques algorithms

spec of the steps)

Materials(datasets parameters

algorithm seeds)

Experiment

Instruments(codes services scripts

underlying libraries)

Laboratory(sw and hw infrastructure

systems software

integrative platforms)

Setup

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Research Environment

submit articleand move onhellip

publish articlePublication Environment

Research Environment

publish articlePublication Environment

submit articleand move onhellip

[Adapted Freire 2013]

transparencydependencies

steps featuresprovenance trace

portability

robustness

preservation

accessavailable

descriptionintelligible

standardscommon APIs

licensing

standardscommon

metadata

change managementversioning

packaging

Machine actionable

Machine actionable

Reproducibility Framework

submit articleand move onhellip

Reporting

Documentation

Provenance ndashThick Trace Data

to Distilled Reporting

Distillation and

Summarisation

Alper P et al LabelFlow Exploiting Workflow Provenance to Surface Scientific Data Provenance IPAW 2014 84-96

Reproduce by ReadingArchived Record Retain the ProcessCode

The IT Crowd Series 3 Episode 4

The eLab Virtual Machine (or Docker Image)

a black box thoughdockercom

Reproduce by Running Active InstrumentRetain the bits

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

Shared Repository

Personal Notebook

Community Registry

Publishing Resource

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

standardsAdobe

UCFORE PROVODF

formats

api

Instrument

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Instrument

NISO-JATS

Instrument

J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013

Platform profiles

NISO-JATS

Instrument

Container

Manifest

OMEX archive

httpsresearchobjectgithubiospecificationsbundle

Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369

Retro-Fitted ROsusing off the shelf

platforms

Method Matters

Reproducibility Smarts

Commons not Repository

Research Tardis

Retro-fit ROs

Do As Little As Possible

Make -gt Born

Native RO platforms

RARE amp FAIR Knowledge Turns Means Research Objects

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researchers

Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

PI Team

RARE Research Reality Check

RARE Research Reality Check

Tribal Behaviour

raquo Gangs share but not with the public

raquo Tribal behaviours rsaquo Modellers share more than Experimentalists

rsaquo Experimentalists reuse models more than Modellers

raquo Trading behavioursrsaquo Collaboration ndash complementarity

correlations

raquo Structured consortia less likely to publicly share than individuals

raquo Post-hoc rationalised DataModel Cycles

[Garza 2014]

raquo Fluid transient collaborations gt ldquomy gangrdquo management

raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity

raquo Class captains (prefects)

raquo Get the cool kids on board

raquo Head teacher leadership

[Garza 2014]

Playground Rules

Trace Data

27032015 74

me

ME

my team

closecolleagues

peers

The Research Release Creep Spiral

raquo Data Hugging amp Flirting

raquo Reciprocity norms

raquo Hans W request

raquo Dowry phenomenon

raquo Private installations

raquo Private spaces on shared installations

raquo Safe havens

Too ugly to show anyone else

Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it

The code is too sophisticated for most readersrefereesI didnt work out all the details

I didnt actually write the code -- my student did

My competitors would be unfair to me

Its valuable intellectual property

It would make papers much longer

Referees would never agree to check the code

My code invokes other code with unpublished (proprietary) code

Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011

Drivers

love money

fame duty

fear timeeffort

shame duty

[Apologies to Resnick and Malone]

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Interface Framingraquo Limited scheduled sharing choices

rsaquo Never say never

raquo ldquoCitablerdquo not ldquoSharedrdquo

raquo Feedback

rsaquo Guilt tripping

rsaquo Outlier finger pointing

[Garzia]

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipResearch Currencies

ldquoResearchBitCoinrdquo

Citation Semantics

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

Page 37: I conference2015 goble-finalupload

The ROs Meme

recompute

replicatererun

repeat

re-examine

repurpose

recreate

reuse

restorereconstruct review

regeneraterevise

recycle

redo

What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo

ldquoshow A is true by doing Brdquo

verify but not falsify[Yong Nature 485 2012]

robustness tolerance

verification compliance

validation assurance

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

1 Science Changes So does the Lab

ldquoThe questions donrsquot

change but the

answers dordquoDan Reed

The lab is not fixedUpdated resources

UncertaintyBioSTIF

Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012

2 Instruments Break Labs Decaymaterials become unavailable technicians leave

Reproducibility Window

raquo Bit rot Black boxes

raquo Proprietary Licenses

raquo Clown services

raquo Partial replication

raquo Prepare to Repair

rsaquo form or function

rsaquo preserve or sustain

Jason Scott

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Methods(techniques algorithms

spec of the steps)

Materials(datasets parameters

algorithm seeds)

Experiment

Instruments(codes services scripts

underlying libraries)

Laboratory(sw and hw infrastructure

systems software

integrative platforms)

Setup

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Research Environment

submit articleand move onhellip

publish articlePublication Environment

Research Environment

publish articlePublication Environment

submit articleand move onhellip

[Adapted Freire 2013]

transparencydependencies

steps featuresprovenance trace

portability

robustness

preservation

accessavailable

descriptionintelligible

standardscommon APIs

licensing

standardscommon

metadata

change managementversioning

packaging

Machine actionable

Machine actionable

Reproducibility Framework

submit articleand move onhellip

Reporting

Documentation

Provenance ndashThick Trace Data

to Distilled Reporting

Distillation and

Summarisation

Alper P et al LabelFlow Exploiting Workflow Provenance to Surface Scientific Data Provenance IPAW 2014 84-96

Reproduce by ReadingArchived Record Retain the ProcessCode

The IT Crowd Series 3 Episode 4

The eLab Virtual Machine (or Docker Image)

a black box thoughdockercom

Reproduce by Running Active InstrumentRetain the bits

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

Shared Repository

Personal Notebook

Community Registry

Publishing Resource

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

standardsAdobe

UCFORE PROVODF

formats

api

Instrument

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Instrument

NISO-JATS

Instrument

J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013

Platform profiles

NISO-JATS

Instrument

Container

Manifest

OMEX archive

httpsresearchobjectgithubiospecificationsbundle

Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369

Retro-Fitted ROsusing off the shelf

platforms

Method Matters

Reproducibility Smarts

Commons not Repository

Research Tardis

Retro-fit ROs

Do As Little As Possible

Make -gt Born

Native RO platforms

RARE amp FAIR Knowledge Turns Means Research Objects

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researchers

Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

PI Team

RARE Research Reality Check

RARE Research Reality Check

Tribal Behaviour

raquo Gangs share but not with the public

raquo Tribal behaviours rsaquo Modellers share more than Experimentalists

rsaquo Experimentalists reuse models more than Modellers

raquo Trading behavioursrsaquo Collaboration ndash complementarity

correlations

raquo Structured consortia less likely to publicly share than individuals

raquo Post-hoc rationalised DataModel Cycles

[Garza 2014]

raquo Fluid transient collaborations gt ldquomy gangrdquo management

raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity

raquo Class captains (prefects)

raquo Get the cool kids on board

raquo Head teacher leadership

[Garza 2014]

Playground Rules

Trace Data

27032015 74

me

ME

my team

closecolleagues

peers

The Research Release Creep Spiral

raquo Data Hugging amp Flirting

raquo Reciprocity norms

raquo Hans W request

raquo Dowry phenomenon

raquo Private installations

raquo Private spaces on shared installations

raquo Safe havens

Too ugly to show anyone else

Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it

The code is too sophisticated for most readersrefereesI didnt work out all the details

I didnt actually write the code -- my student did

My competitors would be unfair to me

Its valuable intellectual property

It would make papers much longer

Referees would never agree to check the code

My code invokes other code with unpublished (proprietary) code

Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011

Drivers

love money

fame duty

fear timeeffort

shame duty

[Apologies to Resnick and Malone]

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Interface Framingraquo Limited scheduled sharing choices

rsaquo Never say never

raquo ldquoCitablerdquo not ldquoSharedrdquo

raquo Feedback

rsaquo Guilt tripping

rsaquo Outlier finger pointing

[Garzia]

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipResearch Currencies

ldquoResearchBitCoinrdquo

Citation Semantics

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

Page 38: I conference2015 goble-finalupload

recompute

replicatererun

repeat

re-examine

repurpose

recreate

reuse

restorereconstruct review

regeneraterevise

recycle

redo

What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo

ldquoshow A is true by doing Brdquo

verify but not falsify[Yong Nature 485 2012]

robustness tolerance

verification compliance

validation assurance

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

1 Science Changes So does the Lab

ldquoThe questions donrsquot

change but the

answers dordquoDan Reed

The lab is not fixedUpdated resources

UncertaintyBioSTIF

Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012

2 Instruments Break Labs Decaymaterials become unavailable technicians leave

Reproducibility Window

raquo Bit rot Black boxes

raquo Proprietary Licenses

raquo Clown services

raquo Partial replication

raquo Prepare to Repair

rsaquo form or function

rsaquo preserve or sustain

Jason Scott

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Methods(techniques algorithms

spec of the steps)

Materials(datasets parameters

algorithm seeds)

Experiment

Instruments(codes services scripts

underlying libraries)

Laboratory(sw and hw infrastructure

systems software

integrative platforms)

Setup

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Research Environment

submit articleand move onhellip

publish articlePublication Environment

Research Environment

publish articlePublication Environment

submit articleand move onhellip

[Adapted Freire 2013]

transparencydependencies

steps featuresprovenance trace

portability

robustness

preservation

accessavailable

descriptionintelligible

standardscommon APIs

licensing

standardscommon

metadata

change managementversioning

packaging

Machine actionable

Machine actionable

Reproducibility Framework

submit articleand move onhellip

Reporting

Documentation

Provenance ndashThick Trace Data

to Distilled Reporting

Distillation and

Summarisation

Alper P et al LabelFlow Exploiting Workflow Provenance to Surface Scientific Data Provenance IPAW 2014 84-96

Reproduce by ReadingArchived Record Retain the ProcessCode

The IT Crowd Series 3 Episode 4

The eLab Virtual Machine (or Docker Image)

a black box thoughdockercom

Reproduce by Running Active InstrumentRetain the bits

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

Shared Repository

Personal Notebook

Community Registry

Publishing Resource

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

standardsAdobe

UCFORE PROVODF

formats

api

Instrument

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Instrument

NISO-JATS

Instrument

J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013

Platform profiles

NISO-JATS

Instrument

Container

Manifest

OMEX archive

httpsresearchobjectgithubiospecificationsbundle

Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369

Retro-Fitted ROsusing off the shelf

platforms

Method Matters

Reproducibility Smarts

Commons not Repository

Research Tardis

Retro-fit ROs

Do As Little As Possible

Make -gt Born

Native RO platforms

RARE amp FAIR Knowledge Turns Means Research Objects

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researchers

Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

PI Team

RARE Research Reality Check

RARE Research Reality Check

Tribal Behaviour

raquo Gangs share but not with the public

raquo Tribal behaviours rsaquo Modellers share more than Experimentalists

rsaquo Experimentalists reuse models more than Modellers

raquo Trading behavioursrsaquo Collaboration ndash complementarity

correlations

raquo Structured consortia less likely to publicly share than individuals

raquo Post-hoc rationalised DataModel Cycles

[Garza 2014]

raquo Fluid transient collaborations gt ldquomy gangrdquo management

raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity

raquo Class captains (prefects)

raquo Get the cool kids on board

raquo Head teacher leadership

[Garza 2014]

Playground Rules

Trace Data

27032015 74

me

ME

my team

closecolleagues

peers

The Research Release Creep Spiral

raquo Data Hugging amp Flirting

raquo Reciprocity norms

raquo Hans W request

raquo Dowry phenomenon

raquo Private installations

raquo Private spaces on shared installations

raquo Safe havens

Too ugly to show anyone else

Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it

The code is too sophisticated for most readersrefereesI didnt work out all the details

I didnt actually write the code -- my student did

My competitors would be unfair to me

Its valuable intellectual property

It would make papers much longer

Referees would never agree to check the code

My code invokes other code with unpublished (proprietary) code

Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011

Drivers

love money

fame duty

fear timeeffort

shame duty

[Apologies to Resnick and Malone]

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Interface Framingraquo Limited scheduled sharing choices

rsaquo Never say never

raquo ldquoCitablerdquo not ldquoSharedrdquo

raquo Feedback

rsaquo Guilt tripping

rsaquo Outlier finger pointing

[Garzia]

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipResearch Currencies

ldquoResearchBitCoinrdquo

Citation Semantics

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

Page 39: I conference2015 goble-finalupload

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

1 Science Changes So does the Lab

ldquoThe questions donrsquot

change but the

answers dordquoDan Reed

The lab is not fixedUpdated resources

UncertaintyBioSTIF

Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012

2 Instruments Break Labs Decaymaterials become unavailable technicians leave

Reproducibility Window

raquo Bit rot Black boxes

raquo Proprietary Licenses

raquo Clown services

raquo Partial replication

raquo Prepare to Repair

rsaquo form or function

rsaquo preserve or sustain

Jason Scott

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Methods(techniques algorithms

spec of the steps)

Materials(datasets parameters

algorithm seeds)

Experiment

Instruments(codes services scripts

underlying libraries)

Laboratory(sw and hw infrastructure

systems software

integrative platforms)

Setup

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Research Environment

submit articleand move onhellip

publish articlePublication Environment

Research Environment

publish articlePublication Environment

submit articleand move onhellip

[Adapted Freire 2013]

transparencydependencies

steps featuresprovenance trace

portability

robustness

preservation

accessavailable

descriptionintelligible

standardscommon APIs

licensing

standardscommon

metadata

change managementversioning

packaging

Machine actionable

Machine actionable

Reproducibility Framework

submit articleand move onhellip

Reporting

Documentation

Provenance ndashThick Trace Data

to Distilled Reporting

Distillation and

Summarisation

Alper P et al LabelFlow Exploiting Workflow Provenance to Surface Scientific Data Provenance IPAW 2014 84-96

Reproduce by ReadingArchived Record Retain the ProcessCode

The IT Crowd Series 3 Episode 4

The eLab Virtual Machine (or Docker Image)

a black box thoughdockercom

Reproduce by Running Active InstrumentRetain the bits

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

Shared Repository

Personal Notebook

Community Registry

Publishing Resource

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

standardsAdobe

UCFORE PROVODF

formats

api

Instrument

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Instrument

NISO-JATS

Instrument

J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013

Platform profiles

NISO-JATS

Instrument

Container

Manifest

OMEX archive

httpsresearchobjectgithubiospecificationsbundle

Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369

Retro-Fitted ROsusing off the shelf

platforms

Method Matters

Reproducibility Smarts

Commons not Repository

Research Tardis

Retro-fit ROs

Do As Little As Possible

Make -gt Born

Native RO platforms

RARE amp FAIR Knowledge Turns Means Research Objects

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researchers

Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

PI Team

RARE Research Reality Check

RARE Research Reality Check

Tribal Behaviour

raquo Gangs share but not with the public

raquo Tribal behaviours rsaquo Modellers share more than Experimentalists

rsaquo Experimentalists reuse models more than Modellers

raquo Trading behavioursrsaquo Collaboration ndash complementarity

correlations

raquo Structured consortia less likely to publicly share than individuals

raquo Post-hoc rationalised DataModel Cycles

[Garza 2014]

raquo Fluid transient collaborations gt ldquomy gangrdquo management

raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity

raquo Class captains (prefects)

raquo Get the cool kids on board

raquo Head teacher leadership

[Garza 2014]

Playground Rules

Trace Data

27032015 74

me

ME

my team

closecolleagues

peers

The Research Release Creep Spiral

raquo Data Hugging amp Flirting

raquo Reciprocity norms

raquo Hans W request

raquo Dowry phenomenon

raquo Private installations

raquo Private spaces on shared installations

raquo Safe havens

Too ugly to show anyone else

Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it

The code is too sophisticated for most readersrefereesI didnt work out all the details

I didnt actually write the code -- my student did

My competitors would be unfair to me

Its valuable intellectual property

It would make papers much longer

Referees would never agree to check the code

My code invokes other code with unpublished (proprietary) code

Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011

Drivers

love money

fame duty

fear timeeffort

shame duty

[Apologies to Resnick and Malone]

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Interface Framingraquo Limited scheduled sharing choices

rsaquo Never say never

raquo ldquoCitablerdquo not ldquoSharedrdquo

raquo Feedback

rsaquo Guilt tripping

rsaquo Outlier finger pointing

[Garzia]

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipResearch Currencies

ldquoResearchBitCoinrdquo

Citation Semantics

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

Page 40: I conference2015 goble-finalupload

1 Science Changes So does the Lab

ldquoThe questions donrsquot

change but the

answers dordquoDan Reed

The lab is not fixedUpdated resources

UncertaintyBioSTIF

Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012

2 Instruments Break Labs Decaymaterials become unavailable technicians leave

Reproducibility Window

raquo Bit rot Black boxes

raquo Proprietary Licenses

raquo Clown services

raquo Partial replication

raquo Prepare to Repair

rsaquo form or function

rsaquo preserve or sustain

Jason Scott

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Methods(techniques algorithms

spec of the steps)

Materials(datasets parameters

algorithm seeds)

Experiment

Instruments(codes services scripts

underlying libraries)

Laboratory(sw and hw infrastructure

systems software

integrative platforms)

Setup

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Research Environment

submit articleand move onhellip

publish articlePublication Environment

Research Environment

publish articlePublication Environment

submit articleand move onhellip

[Adapted Freire 2013]

transparencydependencies

steps featuresprovenance trace

portability

robustness

preservation

accessavailable

descriptionintelligible

standardscommon APIs

licensing

standardscommon

metadata

change managementversioning

packaging

Machine actionable

Machine actionable

Reproducibility Framework

submit articleand move onhellip

Reporting

Documentation

Provenance ndashThick Trace Data

to Distilled Reporting

Distillation and

Summarisation

Alper P et al LabelFlow Exploiting Workflow Provenance to Surface Scientific Data Provenance IPAW 2014 84-96

Reproduce by ReadingArchived Record Retain the ProcessCode

The IT Crowd Series 3 Episode 4

The eLab Virtual Machine (or Docker Image)

a black box thoughdockercom

Reproduce by Running Active InstrumentRetain the bits

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

Shared Repository

Personal Notebook

Community Registry

Publishing Resource

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

standardsAdobe

UCFORE PROVODF

formats

api

Instrument

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Instrument

NISO-JATS

Instrument

J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013

Platform profiles

NISO-JATS

Instrument

Container

Manifest

OMEX archive

httpsresearchobjectgithubiospecificationsbundle

Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369

Retro-Fitted ROsusing off the shelf

platforms

Method Matters

Reproducibility Smarts

Commons not Repository

Research Tardis

Retro-fit ROs

Do As Little As Possible

Make -gt Born

Native RO platforms

RARE amp FAIR Knowledge Turns Means Research Objects

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researchers

Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

PI Team

RARE Research Reality Check

RARE Research Reality Check

Tribal Behaviour

raquo Gangs share but not with the public

raquo Tribal behaviours rsaquo Modellers share more than Experimentalists

rsaquo Experimentalists reuse models more than Modellers

raquo Trading behavioursrsaquo Collaboration ndash complementarity

correlations

raquo Structured consortia less likely to publicly share than individuals

raquo Post-hoc rationalised DataModel Cycles

[Garza 2014]

raquo Fluid transient collaborations gt ldquomy gangrdquo management

raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity

raquo Class captains (prefects)

raquo Get the cool kids on board

raquo Head teacher leadership

[Garza 2014]

Playground Rules

Trace Data

27032015 74

me

ME

my team

closecolleagues

peers

The Research Release Creep Spiral

raquo Data Hugging amp Flirting

raquo Reciprocity norms

raquo Hans W request

raquo Dowry phenomenon

raquo Private installations

raquo Private spaces on shared installations

raquo Safe havens

Too ugly to show anyone else

Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it

The code is too sophisticated for most readersrefereesI didnt work out all the details

I didnt actually write the code -- my student did

My competitors would be unfair to me

Its valuable intellectual property

It would make papers much longer

Referees would never agree to check the code

My code invokes other code with unpublished (proprietary) code

Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011

Drivers

love money

fame duty

fear timeeffort

shame duty

[Apologies to Resnick and Malone]

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Interface Framingraquo Limited scheduled sharing choices

rsaquo Never say never

raquo ldquoCitablerdquo not ldquoSharedrdquo

raquo Feedback

rsaquo Guilt tripping

rsaquo Outlier finger pointing

[Garzia]

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipResearch Currencies

ldquoResearchBitCoinrdquo

Citation Semantics

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

Page 41: I conference2015 goble-finalupload

Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012

2 Instruments Break Labs Decaymaterials become unavailable technicians leave

Reproducibility Window

raquo Bit rot Black boxes

raquo Proprietary Licenses

raquo Clown services

raquo Partial replication

raquo Prepare to Repair

rsaquo form or function

rsaquo preserve or sustain

Jason Scott

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Methods(techniques algorithms

spec of the steps)

Materials(datasets parameters

algorithm seeds)

Experiment

Instruments(codes services scripts

underlying libraries)

Laboratory(sw and hw infrastructure

systems software

integrative platforms)

Setup

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Research Environment

submit articleand move onhellip

publish articlePublication Environment

Research Environment

publish articlePublication Environment

submit articleand move onhellip

[Adapted Freire 2013]

transparencydependencies

steps featuresprovenance trace

portability

robustness

preservation

accessavailable

descriptionintelligible

standardscommon APIs

licensing

standardscommon

metadata

change managementversioning

packaging

Machine actionable

Machine actionable

Reproducibility Framework

submit articleand move onhellip

Reporting

Documentation

Provenance ndashThick Trace Data

to Distilled Reporting

Distillation and

Summarisation

Alper P et al LabelFlow Exploiting Workflow Provenance to Surface Scientific Data Provenance IPAW 2014 84-96

Reproduce by ReadingArchived Record Retain the ProcessCode

The IT Crowd Series 3 Episode 4

The eLab Virtual Machine (or Docker Image)

a black box thoughdockercom

Reproduce by Running Active InstrumentRetain the bits

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

Shared Repository

Personal Notebook

Community Registry

Publishing Resource

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

standardsAdobe

UCFORE PROVODF

formats

api

Instrument

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Instrument

NISO-JATS

Instrument

J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013

Platform profiles

NISO-JATS

Instrument

Container

Manifest

OMEX archive

httpsresearchobjectgithubiospecificationsbundle

Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369

Retro-Fitted ROsusing off the shelf

platforms

Method Matters

Reproducibility Smarts

Commons not Repository

Research Tardis

Retro-fit ROs

Do As Little As Possible

Make -gt Born

Native RO platforms

RARE amp FAIR Knowledge Turns Means Research Objects

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researchers

Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

PI Team

RARE Research Reality Check

RARE Research Reality Check

Tribal Behaviour

raquo Gangs share but not with the public

raquo Tribal behaviours rsaquo Modellers share more than Experimentalists

rsaquo Experimentalists reuse models more than Modellers

raquo Trading behavioursrsaquo Collaboration ndash complementarity

correlations

raquo Structured consortia less likely to publicly share than individuals

raquo Post-hoc rationalised DataModel Cycles

[Garza 2014]

raquo Fluid transient collaborations gt ldquomy gangrdquo management

raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity

raquo Class captains (prefects)

raquo Get the cool kids on board

raquo Head teacher leadership

[Garza 2014]

Playground Rules

Trace Data

27032015 74

me

ME

my team

closecolleagues

peers

The Research Release Creep Spiral

raquo Data Hugging amp Flirting

raquo Reciprocity norms

raquo Hans W request

raquo Dowry phenomenon

raquo Private installations

raquo Private spaces on shared installations

raquo Safe havens

Too ugly to show anyone else

Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it

The code is too sophisticated for most readersrefereesI didnt work out all the details

I didnt actually write the code -- my student did

My competitors would be unfair to me

Its valuable intellectual property

It would make papers much longer

Referees would never agree to check the code

My code invokes other code with unpublished (proprietary) code

Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011

Drivers

love money

fame duty

fear timeeffort

shame duty

[Apologies to Resnick and Malone]

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Interface Framingraquo Limited scheduled sharing choices

rsaquo Never say never

raquo ldquoCitablerdquo not ldquoSharedrdquo

raquo Feedback

rsaquo Guilt tripping

rsaquo Outlier finger pointing

[Garzia]

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipResearch Currencies

ldquoResearchBitCoinrdquo

Citation Semantics

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

Page 42: I conference2015 goble-finalupload

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Methods(techniques algorithms

spec of the steps)

Materials(datasets parameters

algorithm seeds)

Experiment

Instruments(codes services scripts

underlying libraries)

Laboratory(sw and hw infrastructure

systems software

integrative platforms)

Setup

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Research Environment

submit articleand move onhellip

publish articlePublication Environment

Research Environment

publish articlePublication Environment

submit articleand move onhellip

[Adapted Freire 2013]

transparencydependencies

steps featuresprovenance trace

portability

robustness

preservation

accessavailable

descriptionintelligible

standardscommon APIs

licensing

standardscommon

metadata

change managementversioning

packaging

Machine actionable

Machine actionable

Reproducibility Framework

submit articleand move onhellip

Reporting

Documentation

Provenance ndashThick Trace Data

to Distilled Reporting

Distillation and

Summarisation

Alper P et al LabelFlow Exploiting Workflow Provenance to Surface Scientific Data Provenance IPAW 2014 84-96

Reproduce by ReadingArchived Record Retain the ProcessCode

The IT Crowd Series 3 Episode 4

The eLab Virtual Machine (or Docker Image)

a black box thoughdockercom

Reproduce by Running Active InstrumentRetain the bits

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

Shared Repository

Personal Notebook

Community Registry

Publishing Resource

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

standardsAdobe

UCFORE PROVODF

formats

api

Instrument

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Instrument

NISO-JATS

Instrument

J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013

Platform profiles

NISO-JATS

Instrument

Container

Manifest

OMEX archive

httpsresearchobjectgithubiospecificationsbundle

Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369

Retro-Fitted ROsusing off the shelf

platforms

Method Matters

Reproducibility Smarts

Commons not Repository

Research Tardis

Retro-fit ROs

Do As Little As Possible

Make -gt Born

Native RO platforms

RARE amp FAIR Knowledge Turns Means Research Objects

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researchers

Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

PI Team

RARE Research Reality Check

RARE Research Reality Check

Tribal Behaviour

raquo Gangs share but not with the public

raquo Tribal behaviours rsaquo Modellers share more than Experimentalists

rsaquo Experimentalists reuse models more than Modellers

raquo Trading behavioursrsaquo Collaboration ndash complementarity

correlations

raquo Structured consortia less likely to publicly share than individuals

raquo Post-hoc rationalised DataModel Cycles

[Garza 2014]

raquo Fluid transient collaborations gt ldquomy gangrdquo management

raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity

raquo Class captains (prefects)

raquo Get the cool kids on board

raquo Head teacher leadership

[Garza 2014]

Playground Rules

Trace Data

27032015 74

me

ME

my team

closecolleagues

peers

The Research Release Creep Spiral

raquo Data Hugging amp Flirting

raquo Reciprocity norms

raquo Hans W request

raquo Dowry phenomenon

raquo Private installations

raquo Private spaces on shared installations

raquo Safe havens

Too ugly to show anyone else

Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it

The code is too sophisticated for most readersrefereesI didnt work out all the details

I didnt actually write the code -- my student did

My competitors would be unfair to me

Its valuable intellectual property

It would make papers much longer

Referees would never agree to check the code

My code invokes other code with unpublished (proprietary) code

Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011

Drivers

love money

fame duty

fear timeeffort

shame duty

[Apologies to Resnick and Malone]

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Interface Framingraquo Limited scheduled sharing choices

rsaquo Never say never

raquo ldquoCitablerdquo not ldquoSharedrdquo

raquo Feedback

rsaquo Guilt tripping

rsaquo Outlier finger pointing

[Garzia]

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipResearch Currencies

ldquoResearchBitCoinrdquo

Citation Semantics

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

Page 43: I conference2015 goble-finalupload

Research Environment

submit articleand move onhellip

publish articlePublication Environment

Research Environment

publish articlePublication Environment

submit articleand move onhellip

[Adapted Freire 2013]

transparencydependencies

steps featuresprovenance trace

portability

robustness

preservation

accessavailable

descriptionintelligible

standardscommon APIs

licensing

standardscommon

metadata

change managementversioning

packaging

Machine actionable

Machine actionable

Reproducibility Framework

submit articleand move onhellip

Reporting

Documentation

Provenance ndashThick Trace Data

to Distilled Reporting

Distillation and

Summarisation

Alper P et al LabelFlow Exploiting Workflow Provenance to Surface Scientific Data Provenance IPAW 2014 84-96

Reproduce by ReadingArchived Record Retain the ProcessCode

The IT Crowd Series 3 Episode 4

The eLab Virtual Machine (or Docker Image)

a black box thoughdockercom

Reproduce by Running Active InstrumentRetain the bits

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

Shared Repository

Personal Notebook

Community Registry

Publishing Resource

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

standardsAdobe

UCFORE PROVODF

formats

api

Instrument

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Instrument

NISO-JATS

Instrument

J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013

Platform profiles

NISO-JATS

Instrument

Container

Manifest

OMEX archive

httpsresearchobjectgithubiospecificationsbundle

Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369

Retro-Fitted ROsusing off the shelf

platforms

Method Matters

Reproducibility Smarts

Commons not Repository

Research Tardis

Retro-fit ROs

Do As Little As Possible

Make -gt Born

Native RO platforms

RARE amp FAIR Knowledge Turns Means Research Objects

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researchers

Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

PI Team

RARE Research Reality Check

RARE Research Reality Check

Tribal Behaviour

raquo Gangs share but not with the public

raquo Tribal behaviours rsaquo Modellers share more than Experimentalists

rsaquo Experimentalists reuse models more than Modellers

raquo Trading behavioursrsaquo Collaboration ndash complementarity

correlations

raquo Structured consortia less likely to publicly share than individuals

raquo Post-hoc rationalised DataModel Cycles

[Garza 2014]

raquo Fluid transient collaborations gt ldquomy gangrdquo management

raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity

raquo Class captains (prefects)

raquo Get the cool kids on board

raquo Head teacher leadership

[Garza 2014]

Playground Rules

Trace Data

27032015 74

me

ME

my team

closecolleagues

peers

The Research Release Creep Spiral

raquo Data Hugging amp Flirting

raquo Reciprocity norms

raquo Hans W request

raquo Dowry phenomenon

raquo Private installations

raquo Private spaces on shared installations

raquo Safe havens

Too ugly to show anyone else

Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it

The code is too sophisticated for most readersrefereesI didnt work out all the details

I didnt actually write the code -- my student did

My competitors would be unfair to me

Its valuable intellectual property

It would make papers much longer

Referees would never agree to check the code

My code invokes other code with unpublished (proprietary) code

Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011

Drivers

love money

fame duty

fear timeeffort

shame duty

[Apologies to Resnick and Malone]

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Interface Framingraquo Limited scheduled sharing choices

rsaquo Never say never

raquo ldquoCitablerdquo not ldquoSharedrdquo

raquo Feedback

rsaquo Guilt tripping

rsaquo Outlier finger pointing

[Garzia]

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipResearch Currencies

ldquoResearchBitCoinrdquo

Citation Semantics

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

Page 44: I conference2015 goble-finalupload

Research Environment

publish articlePublication Environment

submit articleand move onhellip

[Adapted Freire 2013]

transparencydependencies

steps featuresprovenance trace

portability

robustness

preservation

accessavailable

descriptionintelligible

standardscommon APIs

licensing

standardscommon

metadata

change managementversioning

packaging

Machine actionable

Machine actionable

Reproducibility Framework

submit articleand move onhellip

Reporting

Documentation

Provenance ndashThick Trace Data

to Distilled Reporting

Distillation and

Summarisation

Alper P et al LabelFlow Exploiting Workflow Provenance to Surface Scientific Data Provenance IPAW 2014 84-96

Reproduce by ReadingArchived Record Retain the ProcessCode

The IT Crowd Series 3 Episode 4

The eLab Virtual Machine (or Docker Image)

a black box thoughdockercom

Reproduce by Running Active InstrumentRetain the bits

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

Shared Repository

Personal Notebook

Community Registry

Publishing Resource

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

standardsAdobe

UCFORE PROVODF

formats

api

Instrument

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Instrument

NISO-JATS

Instrument

J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013

Platform profiles

NISO-JATS

Instrument

Container

Manifest

OMEX archive

httpsresearchobjectgithubiospecificationsbundle

Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369

Retro-Fitted ROsusing off the shelf

platforms

Method Matters

Reproducibility Smarts

Commons not Repository

Research Tardis

Retro-fit ROs

Do As Little As Possible

Make -gt Born

Native RO platforms

RARE amp FAIR Knowledge Turns Means Research Objects

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researchers

Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

PI Team

RARE Research Reality Check

RARE Research Reality Check

Tribal Behaviour

raquo Gangs share but not with the public

raquo Tribal behaviours rsaquo Modellers share more than Experimentalists

rsaquo Experimentalists reuse models more than Modellers

raquo Trading behavioursrsaquo Collaboration ndash complementarity

correlations

raquo Structured consortia less likely to publicly share than individuals

raquo Post-hoc rationalised DataModel Cycles

[Garza 2014]

raquo Fluid transient collaborations gt ldquomy gangrdquo management

raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity

raquo Class captains (prefects)

raquo Get the cool kids on board

raquo Head teacher leadership

[Garza 2014]

Playground Rules

Trace Data

27032015 74

me

ME

my team

closecolleagues

peers

The Research Release Creep Spiral

raquo Data Hugging amp Flirting

raquo Reciprocity norms

raquo Hans W request

raquo Dowry phenomenon

raquo Private installations

raquo Private spaces on shared installations

raquo Safe havens

Too ugly to show anyone else

Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it

The code is too sophisticated for most readersrefereesI didnt work out all the details

I didnt actually write the code -- my student did

My competitors would be unfair to me

Its valuable intellectual property

It would make papers much longer

Referees would never agree to check the code

My code invokes other code with unpublished (proprietary) code

Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011

Drivers

love money

fame duty

fear timeeffort

shame duty

[Apologies to Resnick and Malone]

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Interface Framingraquo Limited scheduled sharing choices

rsaquo Never say never

raquo ldquoCitablerdquo not ldquoSharedrdquo

raquo Feedback

rsaquo Guilt tripping

rsaquo Outlier finger pointing

[Garzia]

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipResearch Currencies

ldquoResearchBitCoinrdquo

Citation Semantics

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

Page 45: I conference2015 goble-finalupload

[Adapted Freire 2013]

transparencydependencies

steps featuresprovenance trace

portability

robustness

preservation

accessavailable

descriptionintelligible

standardscommon APIs

licensing

standardscommon

metadata

change managementversioning

packaging

Machine actionable

Machine actionable

Reproducibility Framework

submit articleand move onhellip

Reporting

Documentation

Provenance ndashThick Trace Data

to Distilled Reporting

Distillation and

Summarisation

Alper P et al LabelFlow Exploiting Workflow Provenance to Surface Scientific Data Provenance IPAW 2014 84-96

Reproduce by ReadingArchived Record Retain the ProcessCode

The IT Crowd Series 3 Episode 4

The eLab Virtual Machine (or Docker Image)

a black box thoughdockercom

Reproduce by Running Active InstrumentRetain the bits

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

Shared Repository

Personal Notebook

Community Registry

Publishing Resource

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

standardsAdobe

UCFORE PROVODF

formats

api

Instrument

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Instrument

NISO-JATS

Instrument

J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013

Platform profiles

NISO-JATS

Instrument

Container

Manifest

OMEX archive

httpsresearchobjectgithubiospecificationsbundle

Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369

Retro-Fitted ROsusing off the shelf

platforms

Method Matters

Reproducibility Smarts

Commons not Repository

Research Tardis

Retro-fit ROs

Do As Little As Possible

Make -gt Born

Native RO platforms

RARE amp FAIR Knowledge Turns Means Research Objects

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researchers

Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

PI Team

RARE Research Reality Check

RARE Research Reality Check

Tribal Behaviour

raquo Gangs share but not with the public

raquo Tribal behaviours rsaquo Modellers share more than Experimentalists

rsaquo Experimentalists reuse models more than Modellers

raquo Trading behavioursrsaquo Collaboration ndash complementarity

correlations

raquo Structured consortia less likely to publicly share than individuals

raquo Post-hoc rationalised DataModel Cycles

[Garza 2014]

raquo Fluid transient collaborations gt ldquomy gangrdquo management

raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity

raquo Class captains (prefects)

raquo Get the cool kids on board

raquo Head teacher leadership

[Garza 2014]

Playground Rules

Trace Data

27032015 74

me

ME

my team

closecolleagues

peers

The Research Release Creep Spiral

raquo Data Hugging amp Flirting

raquo Reciprocity norms

raquo Hans W request

raquo Dowry phenomenon

raquo Private installations

raquo Private spaces on shared installations

raquo Safe havens

Too ugly to show anyone else

Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it

The code is too sophisticated for most readersrefereesI didnt work out all the details

I didnt actually write the code -- my student did

My competitors would be unfair to me

Its valuable intellectual property

It would make papers much longer

Referees would never agree to check the code

My code invokes other code with unpublished (proprietary) code

Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011

Drivers

love money

fame duty

fear timeeffort

shame duty

[Apologies to Resnick and Malone]

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Interface Framingraquo Limited scheduled sharing choices

rsaquo Never say never

raquo ldquoCitablerdquo not ldquoSharedrdquo

raquo Feedback

rsaquo Guilt tripping

rsaquo Outlier finger pointing

[Garzia]

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipResearch Currencies

ldquoResearchBitCoinrdquo

Citation Semantics

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

Page 46: I conference2015 goble-finalupload

submit articleand move onhellip

Reporting

Documentation

Provenance ndashThick Trace Data

to Distilled Reporting

Distillation and

Summarisation

Alper P et al LabelFlow Exploiting Workflow Provenance to Surface Scientific Data Provenance IPAW 2014 84-96

Reproduce by ReadingArchived Record Retain the ProcessCode

The IT Crowd Series 3 Episode 4

The eLab Virtual Machine (or Docker Image)

a black box thoughdockercom

Reproduce by Running Active InstrumentRetain the bits

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

Shared Repository

Personal Notebook

Community Registry

Publishing Resource

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

standardsAdobe

UCFORE PROVODF

formats

api

Instrument

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Instrument

NISO-JATS

Instrument

J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013

Platform profiles

NISO-JATS

Instrument

Container

Manifest

OMEX archive

httpsresearchobjectgithubiospecificationsbundle

Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369

Retro-Fitted ROsusing off the shelf

platforms

Method Matters

Reproducibility Smarts

Commons not Repository

Research Tardis

Retro-fit ROs

Do As Little As Possible

Make -gt Born

Native RO platforms

RARE amp FAIR Knowledge Turns Means Research Objects

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researchers

Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

PI Team

RARE Research Reality Check

RARE Research Reality Check

Tribal Behaviour

raquo Gangs share but not with the public

raquo Tribal behaviours rsaquo Modellers share more than Experimentalists

rsaquo Experimentalists reuse models more than Modellers

raquo Trading behavioursrsaquo Collaboration ndash complementarity

correlations

raquo Structured consortia less likely to publicly share than individuals

raquo Post-hoc rationalised DataModel Cycles

[Garza 2014]

raquo Fluid transient collaborations gt ldquomy gangrdquo management

raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity

raquo Class captains (prefects)

raquo Get the cool kids on board

raquo Head teacher leadership

[Garza 2014]

Playground Rules

Trace Data

27032015 74

me

ME

my team

closecolleagues

peers

The Research Release Creep Spiral

raquo Data Hugging amp Flirting

raquo Reciprocity norms

raquo Hans W request

raquo Dowry phenomenon

raquo Private installations

raquo Private spaces on shared installations

raquo Safe havens

Too ugly to show anyone else

Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it

The code is too sophisticated for most readersrefereesI didnt work out all the details

I didnt actually write the code -- my student did

My competitors would be unfair to me

Its valuable intellectual property

It would make papers much longer

Referees would never agree to check the code

My code invokes other code with unpublished (proprietary) code

Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011

Drivers

love money

fame duty

fear timeeffort

shame duty

[Apologies to Resnick and Malone]

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Interface Framingraquo Limited scheduled sharing choices

rsaquo Never say never

raquo ldquoCitablerdquo not ldquoSharedrdquo

raquo Feedback

rsaquo Guilt tripping

rsaquo Outlier finger pointing

[Garzia]

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipResearch Currencies

ldquoResearchBitCoinrdquo

Citation Semantics

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

Page 47: I conference2015 goble-finalupload

Reproduce by ReadingArchived Record Retain the ProcessCode

The IT Crowd Series 3 Episode 4

The eLab Virtual Machine (or Docker Image)

a black box thoughdockercom

Reproduce by Running Active InstrumentRetain the bits

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

Shared Repository

Personal Notebook

Community Registry

Publishing Resource

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

standardsAdobe

UCFORE PROVODF

formats

api

Instrument

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Instrument

NISO-JATS

Instrument

J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013

Platform profiles

NISO-JATS

Instrument

Container

Manifest

OMEX archive

httpsresearchobjectgithubiospecificationsbundle

Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369

Retro-Fitted ROsusing off the shelf

platforms

Method Matters

Reproducibility Smarts

Commons not Repository

Research Tardis

Retro-fit ROs

Do As Little As Possible

Make -gt Born

Native RO platforms

RARE amp FAIR Knowledge Turns Means Research Objects

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researchers

Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

PI Team

RARE Research Reality Check

RARE Research Reality Check

Tribal Behaviour

raquo Gangs share but not with the public

raquo Tribal behaviours rsaquo Modellers share more than Experimentalists

rsaquo Experimentalists reuse models more than Modellers

raquo Trading behavioursrsaquo Collaboration ndash complementarity

correlations

raquo Structured consortia less likely to publicly share than individuals

raquo Post-hoc rationalised DataModel Cycles

[Garza 2014]

raquo Fluid transient collaborations gt ldquomy gangrdquo management

raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity

raquo Class captains (prefects)

raquo Get the cool kids on board

raquo Head teacher leadership

[Garza 2014]

Playground Rules

Trace Data

27032015 74

me

ME

my team

closecolleagues

peers

The Research Release Creep Spiral

raquo Data Hugging amp Flirting

raquo Reciprocity norms

raquo Hans W request

raquo Dowry phenomenon

raquo Private installations

raquo Private spaces on shared installations

raquo Safe havens

Too ugly to show anyone else

Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it

The code is too sophisticated for most readersrefereesI didnt work out all the details

I didnt actually write the code -- my student did

My competitors would be unfair to me

Its valuable intellectual property

It would make papers much longer

Referees would never agree to check the code

My code invokes other code with unpublished (proprietary) code

Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011

Drivers

love money

fame duty

fear timeeffort

shame duty

[Apologies to Resnick and Malone]

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Interface Framingraquo Limited scheduled sharing choices

rsaquo Never say never

raquo ldquoCitablerdquo not ldquoSharedrdquo

raquo Feedback

rsaquo Guilt tripping

rsaquo Outlier finger pointing

[Garzia]

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipResearch Currencies

ldquoResearchBitCoinrdquo

Citation Semantics

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

Page 48: I conference2015 goble-finalupload

The IT Crowd Series 3 Episode 4

The eLab Virtual Machine (or Docker Image)

a black box thoughdockercom

Reproduce by Running Active InstrumentRetain the bits

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

Shared Repository

Personal Notebook

Community Registry

Publishing Resource

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

standardsAdobe

UCFORE PROVODF

formats

api

Instrument

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Instrument

NISO-JATS

Instrument

J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013

Platform profiles

NISO-JATS

Instrument

Container

Manifest

OMEX archive

httpsresearchobjectgithubiospecificationsbundle

Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369

Retro-Fitted ROsusing off the shelf

platforms

Method Matters

Reproducibility Smarts

Commons not Repository

Research Tardis

Retro-fit ROs

Do As Little As Possible

Make -gt Born

Native RO platforms

RARE amp FAIR Knowledge Turns Means Research Objects

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researchers

Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

PI Team

RARE Research Reality Check

RARE Research Reality Check

Tribal Behaviour

raquo Gangs share but not with the public

raquo Tribal behaviours rsaquo Modellers share more than Experimentalists

rsaquo Experimentalists reuse models more than Modellers

raquo Trading behavioursrsaquo Collaboration ndash complementarity

correlations

raquo Structured consortia less likely to publicly share than individuals

raquo Post-hoc rationalised DataModel Cycles

[Garza 2014]

raquo Fluid transient collaborations gt ldquomy gangrdquo management

raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity

raquo Class captains (prefects)

raquo Get the cool kids on board

raquo Head teacher leadership

[Garza 2014]

Playground Rules

Trace Data

27032015 74

me

ME

my team

closecolleagues

peers

The Research Release Creep Spiral

raquo Data Hugging amp Flirting

raquo Reciprocity norms

raquo Hans W request

raquo Dowry phenomenon

raquo Private installations

raquo Private spaces on shared installations

raquo Safe havens

Too ugly to show anyone else

Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it

The code is too sophisticated for most readersrefereesI didnt work out all the details

I didnt actually write the code -- my student did

My competitors would be unfair to me

Its valuable intellectual property

It would make papers much longer

Referees would never agree to check the code

My code invokes other code with unpublished (proprietary) code

Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011

Drivers

love money

fame duty

fear timeeffort

shame duty

[Apologies to Resnick and Malone]

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Interface Framingraquo Limited scheduled sharing choices

rsaquo Never say never

raquo ldquoCitablerdquo not ldquoSharedrdquo

raquo Feedback

rsaquo Guilt tripping

rsaquo Outlier finger pointing

[Garzia]

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipResearch Currencies

ldquoResearchBitCoinrdquo

Citation Semantics

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

Page 49: I conference2015 goble-finalupload

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

Shared Repository

Personal Notebook

Community Registry

Publishing Resource

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

standardsAdobe

UCFORE PROVODF

formats

api

Instrument

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Instrument

NISO-JATS

Instrument

J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013

Platform profiles

NISO-JATS

Instrument

Container

Manifest

OMEX archive

httpsresearchobjectgithubiospecificationsbundle

Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369

Retro-Fitted ROsusing off the shelf

platforms

Method Matters

Reproducibility Smarts

Commons not Repository

Research Tardis

Retro-fit ROs

Do As Little As Possible

Make -gt Born

Native RO platforms

RARE amp FAIR Knowledge Turns Means Research Objects

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researchers

Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

PI Team

RARE Research Reality Check

RARE Research Reality Check

Tribal Behaviour

raquo Gangs share but not with the public

raquo Tribal behaviours rsaquo Modellers share more than Experimentalists

rsaquo Experimentalists reuse models more than Modellers

raquo Trading behavioursrsaquo Collaboration ndash complementarity

correlations

raquo Structured consortia less likely to publicly share than individuals

raquo Post-hoc rationalised DataModel Cycles

[Garza 2014]

raquo Fluid transient collaborations gt ldquomy gangrdquo management

raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity

raquo Class captains (prefects)

raquo Get the cool kids on board

raquo Head teacher leadership

[Garza 2014]

Playground Rules

Trace Data

27032015 74

me

ME

my team

closecolleagues

peers

The Research Release Creep Spiral

raquo Data Hugging amp Flirting

raquo Reciprocity norms

raquo Hans W request

raquo Dowry phenomenon

raquo Private installations

raquo Private spaces on shared installations

raquo Safe havens

Too ugly to show anyone else

Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it

The code is too sophisticated for most readersrefereesI didnt work out all the details

I didnt actually write the code -- my student did

My competitors would be unfair to me

Its valuable intellectual property

It would make papers much longer

Referees would never agree to check the code

My code invokes other code with unpublished (proprietary) code

Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011

Drivers

love money

fame duty

fear timeeffort

shame duty

[Apologies to Resnick and Malone]

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Interface Framingraquo Limited scheduled sharing choices

rsaquo Never say never

raquo ldquoCitablerdquo not ldquoSharedrdquo

raquo Feedback

rsaquo Guilt tripping

rsaquo Outlier finger pointing

[Garzia]

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipResearch Currencies

ldquoResearchBitCoinrdquo

Citation Semantics

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

Page 50: I conference2015 goble-finalupload

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

Shared Repository

Personal Notebook

Community Registry

Publishing Resource

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

standardsAdobe

UCFORE PROVODF

formats

api

Instrument

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Instrument

NISO-JATS

Instrument

J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013

Platform profiles

NISO-JATS

Instrument

Container

Manifest

OMEX archive

httpsresearchobjectgithubiospecificationsbundle

Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369

Retro-Fitted ROsusing off the shelf

platforms

Method Matters

Reproducibility Smarts

Commons not Repository

Research Tardis

Retro-fit ROs

Do As Little As Possible

Make -gt Born

Native RO platforms

RARE amp FAIR Knowledge Turns Means Research Objects

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researchers

Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

PI Team

RARE Research Reality Check

RARE Research Reality Check

Tribal Behaviour

raquo Gangs share but not with the public

raquo Tribal behaviours rsaquo Modellers share more than Experimentalists

rsaquo Experimentalists reuse models more than Modellers

raquo Trading behavioursrsaquo Collaboration ndash complementarity

correlations

raquo Structured consortia less likely to publicly share than individuals

raquo Post-hoc rationalised DataModel Cycles

[Garza 2014]

raquo Fluid transient collaborations gt ldquomy gangrdquo management

raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity

raquo Class captains (prefects)

raquo Get the cool kids on board

raquo Head teacher leadership

[Garza 2014]

Playground Rules

Trace Data

27032015 74

me

ME

my team

closecolleagues

peers

The Research Release Creep Spiral

raquo Data Hugging amp Flirting

raquo Reciprocity norms

raquo Hans W request

raquo Dowry phenomenon

raquo Private installations

raquo Private spaces on shared installations

raquo Safe havens

Too ugly to show anyone else

Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it

The code is too sophisticated for most readersrefereesI didnt work out all the details

I didnt actually write the code -- my student did

My competitors would be unfair to me

Its valuable intellectual property

It would make papers much longer

Referees would never agree to check the code

My code invokes other code with unpublished (proprietary) code

Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011

Drivers

love money

fame duty

fear timeeffort

shame duty

[Apologies to Resnick and Malone]

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Interface Framingraquo Limited scheduled sharing choices

rsaquo Never say never

raquo ldquoCitablerdquo not ldquoSharedrdquo

raquo Feedback

rsaquo Guilt tripping

rsaquo Outlier finger pointing

[Garzia]

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipResearch Currencies

ldquoResearchBitCoinrdquo

Citation Semantics

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

Page 51: I conference2015 goble-finalupload

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

Shared Repository

Personal Notebook

Community Registry

Publishing Resource

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

standardsAdobe

UCFORE PROVODF

formats

api

Instrument

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Instrument

NISO-JATS

Instrument

J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013

Platform profiles

NISO-JATS

Instrument

Container

Manifest

OMEX archive

httpsresearchobjectgithubiospecificationsbundle

Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369

Retro-Fitted ROsusing off the shelf

platforms

Method Matters

Reproducibility Smarts

Commons not Repository

Research Tardis

Retro-fit ROs

Do As Little As Possible

Make -gt Born

Native RO platforms

RARE amp FAIR Knowledge Turns Means Research Objects

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researchers

Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

PI Team

RARE Research Reality Check

RARE Research Reality Check

Tribal Behaviour

raquo Gangs share but not with the public

raquo Tribal behaviours rsaquo Modellers share more than Experimentalists

rsaquo Experimentalists reuse models more than Modellers

raquo Trading behavioursrsaquo Collaboration ndash complementarity

correlations

raquo Structured consortia less likely to publicly share than individuals

raquo Post-hoc rationalised DataModel Cycles

[Garza 2014]

raquo Fluid transient collaborations gt ldquomy gangrdquo management

raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity

raquo Class captains (prefects)

raquo Get the cool kids on board

raquo Head teacher leadership

[Garza 2014]

Playground Rules

Trace Data

27032015 74

me

ME

my team

closecolleagues

peers

The Research Release Creep Spiral

raquo Data Hugging amp Flirting

raquo Reciprocity norms

raquo Hans W request

raquo Dowry phenomenon

raquo Private installations

raquo Private spaces on shared installations

raquo Safe havens

Too ugly to show anyone else

Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it

The code is too sophisticated for most readersrefereesI didnt work out all the details

I didnt actually write the code -- my student did

My competitors would be unfair to me

Its valuable intellectual property

It would make papers much longer

Referees would never agree to check the code

My code invokes other code with unpublished (proprietary) code

Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011

Drivers

love money

fame duty

fear timeeffort

shame duty

[Apologies to Resnick and Malone]

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Interface Framingraquo Limited scheduled sharing choices

rsaquo Never say never

raquo ldquoCitablerdquo not ldquoSharedrdquo

raquo Feedback

rsaquo Guilt tripping

rsaquo Outlier finger pointing

[Garzia]

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipResearch Currencies

ldquoResearchBitCoinrdquo

Citation Semantics

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

Page 52: I conference2015 goble-finalupload

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

standardsAdobe

UCFORE PROVODF

formats

api

Instrument

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Instrument

NISO-JATS

Instrument

J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013

Platform profiles

NISO-JATS

Instrument

Container

Manifest

OMEX archive

httpsresearchobjectgithubiospecificationsbundle

Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369

Retro-Fitted ROsusing off the shelf

platforms

Method Matters

Reproducibility Smarts

Commons not Repository

Research Tardis

Retro-fit ROs

Do As Little As Possible

Make -gt Born

Native RO platforms

RARE amp FAIR Knowledge Turns Means Research Objects

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researchers

Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

PI Team

RARE Research Reality Check

RARE Research Reality Check

Tribal Behaviour

raquo Gangs share but not with the public

raquo Tribal behaviours rsaquo Modellers share more than Experimentalists

rsaquo Experimentalists reuse models more than Modellers

raquo Trading behavioursrsaquo Collaboration ndash complementarity

correlations

raquo Structured consortia less likely to publicly share than individuals

raquo Post-hoc rationalised DataModel Cycles

[Garza 2014]

raquo Fluid transient collaborations gt ldquomy gangrdquo management

raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity

raquo Class captains (prefects)

raquo Get the cool kids on board

raquo Head teacher leadership

[Garza 2014]

Playground Rules

Trace Data

27032015 74

me

ME

my team

closecolleagues

peers

The Research Release Creep Spiral

raquo Data Hugging amp Flirting

raquo Reciprocity norms

raquo Hans W request

raquo Dowry phenomenon

raquo Private installations

raquo Private spaces on shared installations

raquo Safe havens

Too ugly to show anyone else

Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it

The code is too sophisticated for most readersrefereesI didnt work out all the details

I didnt actually write the code -- my student did

My competitors would be unfair to me

Its valuable intellectual property

It would make papers much longer

Referees would never agree to check the code

My code invokes other code with unpublished (proprietary) code

Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011

Drivers

love money

fame duty

fear timeeffort

shame duty

[Apologies to Resnick and Malone]

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Interface Framingraquo Limited scheduled sharing choices

rsaquo Never say never

raquo ldquoCitablerdquo not ldquoSharedrdquo

raquo Feedback

rsaquo Guilt tripping

rsaquo Outlier finger pointing

[Garzia]

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipResearch Currencies

ldquoResearchBitCoinrdquo

Citation Semantics

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

Page 53: I conference2015 goble-finalupload

standardsAdobe

UCFORE PROVODF

formats

api

Instrument

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Instrument

NISO-JATS

Instrument

J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013

Platform profiles

NISO-JATS

Instrument

Container

Manifest

OMEX archive

httpsresearchobjectgithubiospecificationsbundle

Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369

Retro-Fitted ROsusing off the shelf

platforms

Method Matters

Reproducibility Smarts

Commons not Repository

Research Tardis

Retro-fit ROs

Do As Little As Possible

Make -gt Born

Native RO platforms

RARE amp FAIR Knowledge Turns Means Research Objects

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researchers

Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

PI Team

RARE Research Reality Check

RARE Research Reality Check

Tribal Behaviour

raquo Gangs share but not with the public

raquo Tribal behaviours rsaquo Modellers share more than Experimentalists

rsaquo Experimentalists reuse models more than Modellers

raquo Trading behavioursrsaquo Collaboration ndash complementarity

correlations

raquo Structured consortia less likely to publicly share than individuals

raquo Post-hoc rationalised DataModel Cycles

[Garza 2014]

raquo Fluid transient collaborations gt ldquomy gangrdquo management

raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity

raquo Class captains (prefects)

raquo Get the cool kids on board

raquo Head teacher leadership

[Garza 2014]

Playground Rules

Trace Data

27032015 74

me

ME

my team

closecolleagues

peers

The Research Release Creep Spiral

raquo Data Hugging amp Flirting

raquo Reciprocity norms

raquo Hans W request

raquo Dowry phenomenon

raquo Private installations

raquo Private spaces on shared installations

raquo Safe havens

Too ugly to show anyone else

Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it

The code is too sophisticated for most readersrefereesI didnt work out all the details

I didnt actually write the code -- my student did

My competitors would be unfair to me

Its valuable intellectual property

It would make papers much longer

Referees would never agree to check the code

My code invokes other code with unpublished (proprietary) code

Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011

Drivers

love money

fame duty

fear timeeffort

shame duty

[Apologies to Resnick and Malone]

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Interface Framingraquo Limited scheduled sharing choices

rsaquo Never say never

raquo ldquoCitablerdquo not ldquoSharedrdquo

raquo Feedback

rsaquo Guilt tripping

rsaquo Outlier finger pointing

[Garzia]

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipResearch Currencies

ldquoResearchBitCoinrdquo

Citation Semantics

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

Page 54: I conference2015 goble-finalupload

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Instrument

NISO-JATS

Instrument

J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013

Platform profiles

NISO-JATS

Instrument

Container

Manifest

OMEX archive

httpsresearchobjectgithubiospecificationsbundle

Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369

Retro-Fitted ROsusing off the shelf

platforms

Method Matters

Reproducibility Smarts

Commons not Repository

Research Tardis

Retro-fit ROs

Do As Little As Possible

Make -gt Born

Native RO platforms

RARE amp FAIR Knowledge Turns Means Research Objects

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researchers

Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

PI Team

RARE Research Reality Check

RARE Research Reality Check

Tribal Behaviour

raquo Gangs share but not with the public

raquo Tribal behaviours rsaquo Modellers share more than Experimentalists

rsaquo Experimentalists reuse models more than Modellers

raquo Trading behavioursrsaquo Collaboration ndash complementarity

correlations

raquo Structured consortia less likely to publicly share than individuals

raquo Post-hoc rationalised DataModel Cycles

[Garza 2014]

raquo Fluid transient collaborations gt ldquomy gangrdquo management

raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity

raquo Class captains (prefects)

raquo Get the cool kids on board

raquo Head teacher leadership

[Garza 2014]

Playground Rules

Trace Data

27032015 74

me

ME

my team

closecolleagues

peers

The Research Release Creep Spiral

raquo Data Hugging amp Flirting

raquo Reciprocity norms

raquo Hans W request

raquo Dowry phenomenon

raquo Private installations

raquo Private spaces on shared installations

raquo Safe havens

Too ugly to show anyone else

Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it

The code is too sophisticated for most readersrefereesI didnt work out all the details

I didnt actually write the code -- my student did

My competitors would be unfair to me

Its valuable intellectual property

It would make papers much longer

Referees would never agree to check the code

My code invokes other code with unpublished (proprietary) code

Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011

Drivers

love money

fame duty

fear timeeffort

shame duty

[Apologies to Resnick and Malone]

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Interface Framingraquo Limited scheduled sharing choices

rsaquo Never say never

raquo ldquoCitablerdquo not ldquoSharedrdquo

raquo Feedback

rsaquo Guilt tripping

rsaquo Outlier finger pointing

[Garzia]

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipResearch Currencies

ldquoResearchBitCoinrdquo

Citation Semantics

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

Page 55: I conference2015 goble-finalupload

NISO-JATS

Instrument

J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013

Platform profiles

NISO-JATS

Instrument

Container

Manifest

OMEX archive

httpsresearchobjectgithubiospecificationsbundle

Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369

Retro-Fitted ROsusing off the shelf

platforms

Method Matters

Reproducibility Smarts

Commons not Repository

Research Tardis

Retro-fit ROs

Do As Little As Possible

Make -gt Born

Native RO platforms

RARE amp FAIR Knowledge Turns Means Research Objects

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researchers

Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

PI Team

RARE Research Reality Check

RARE Research Reality Check

Tribal Behaviour

raquo Gangs share but not with the public

raquo Tribal behaviours rsaquo Modellers share more than Experimentalists

rsaquo Experimentalists reuse models more than Modellers

raquo Trading behavioursrsaquo Collaboration ndash complementarity

correlations

raquo Structured consortia less likely to publicly share than individuals

raquo Post-hoc rationalised DataModel Cycles

[Garza 2014]

raquo Fluid transient collaborations gt ldquomy gangrdquo management

raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity

raquo Class captains (prefects)

raquo Get the cool kids on board

raquo Head teacher leadership

[Garza 2014]

Playground Rules

Trace Data

27032015 74

me

ME

my team

closecolleagues

peers

The Research Release Creep Spiral

raquo Data Hugging amp Flirting

raquo Reciprocity norms

raquo Hans W request

raquo Dowry phenomenon

raquo Private installations

raquo Private spaces on shared installations

raquo Safe havens

Too ugly to show anyone else

Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it

The code is too sophisticated for most readersrefereesI didnt work out all the details

I didnt actually write the code -- my student did

My competitors would be unfair to me

Its valuable intellectual property

It would make papers much longer

Referees would never agree to check the code

My code invokes other code with unpublished (proprietary) code

Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011

Drivers

love money

fame duty

fear timeeffort

shame duty

[Apologies to Resnick and Malone]

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Interface Framingraquo Limited scheduled sharing choices

rsaquo Never say never

raquo ldquoCitablerdquo not ldquoSharedrdquo

raquo Feedback

rsaquo Guilt tripping

rsaquo Outlier finger pointing

[Garzia]

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipResearch Currencies

ldquoResearchBitCoinrdquo

Citation Semantics

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

Page 56: I conference2015 goble-finalupload

Platform profiles

NISO-JATS

Instrument

Container

Manifest

OMEX archive

httpsresearchobjectgithubiospecificationsbundle

Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369

Retro-Fitted ROsusing off the shelf

platforms

Method Matters

Reproducibility Smarts

Commons not Repository

Research Tardis

Retro-fit ROs

Do As Little As Possible

Make -gt Born

Native RO platforms

RARE amp FAIR Knowledge Turns Means Research Objects

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researchers

Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

PI Team

RARE Research Reality Check

RARE Research Reality Check

Tribal Behaviour

raquo Gangs share but not with the public

raquo Tribal behaviours rsaquo Modellers share more than Experimentalists

rsaquo Experimentalists reuse models more than Modellers

raquo Trading behavioursrsaquo Collaboration ndash complementarity

correlations

raquo Structured consortia less likely to publicly share than individuals

raquo Post-hoc rationalised DataModel Cycles

[Garza 2014]

raquo Fluid transient collaborations gt ldquomy gangrdquo management

raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity

raquo Class captains (prefects)

raquo Get the cool kids on board

raquo Head teacher leadership

[Garza 2014]

Playground Rules

Trace Data

27032015 74

me

ME

my team

closecolleagues

peers

The Research Release Creep Spiral

raquo Data Hugging amp Flirting

raquo Reciprocity norms

raquo Hans W request

raquo Dowry phenomenon

raquo Private installations

raquo Private spaces on shared installations

raquo Safe havens

Too ugly to show anyone else

Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it

The code is too sophisticated for most readersrefereesI didnt work out all the details

I didnt actually write the code -- my student did

My competitors would be unfair to me

Its valuable intellectual property

It would make papers much longer

Referees would never agree to check the code

My code invokes other code with unpublished (proprietary) code

Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011

Drivers

love money

fame duty

fear timeeffort

shame duty

[Apologies to Resnick and Malone]

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Interface Framingraquo Limited scheduled sharing choices

rsaquo Never say never

raquo ldquoCitablerdquo not ldquoSharedrdquo

raquo Feedback

rsaquo Guilt tripping

rsaquo Outlier finger pointing

[Garzia]

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipResearch Currencies

ldquoResearchBitCoinrdquo

Citation Semantics

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

Page 57: I conference2015 goble-finalupload

Container

Manifest

OMEX archive

httpsresearchobjectgithubiospecificationsbundle

Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369

Retro-Fitted ROsusing off the shelf

platforms

Method Matters

Reproducibility Smarts

Commons not Repository

Research Tardis

Retro-fit ROs

Do As Little As Possible

Make -gt Born

Native RO platforms

RARE amp FAIR Knowledge Turns Means Research Objects

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researchers

Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

PI Team

RARE Research Reality Check

RARE Research Reality Check

Tribal Behaviour

raquo Gangs share but not with the public

raquo Tribal behaviours rsaquo Modellers share more than Experimentalists

rsaquo Experimentalists reuse models more than Modellers

raquo Trading behavioursrsaquo Collaboration ndash complementarity

correlations

raquo Structured consortia less likely to publicly share than individuals

raquo Post-hoc rationalised DataModel Cycles

[Garza 2014]

raquo Fluid transient collaborations gt ldquomy gangrdquo management

raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity

raquo Class captains (prefects)

raquo Get the cool kids on board

raquo Head teacher leadership

[Garza 2014]

Playground Rules

Trace Data

27032015 74

me

ME

my team

closecolleagues

peers

The Research Release Creep Spiral

raquo Data Hugging amp Flirting

raquo Reciprocity norms

raquo Hans W request

raquo Dowry phenomenon

raquo Private installations

raquo Private spaces on shared installations

raquo Safe havens

Too ugly to show anyone else

Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it

The code is too sophisticated for most readersrefereesI didnt work out all the details

I didnt actually write the code -- my student did

My competitors would be unfair to me

Its valuable intellectual property

It would make papers much longer

Referees would never agree to check the code

My code invokes other code with unpublished (proprietary) code

Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011

Drivers

love money

fame duty

fear timeeffort

shame duty

[Apologies to Resnick and Malone]

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Interface Framingraquo Limited scheduled sharing choices

rsaquo Never say never

raquo ldquoCitablerdquo not ldquoSharedrdquo

raquo Feedback

rsaquo Guilt tripping

rsaquo Outlier finger pointing

[Garzia]

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipResearch Currencies

ldquoResearchBitCoinrdquo

Citation Semantics

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

Page 58: I conference2015 goble-finalupload

Method Matters

Reproducibility Smarts

Commons not Repository

Research Tardis

Retro-fit ROs

Do As Little As Possible

Make -gt Born

Native RO platforms

RARE amp FAIR Knowledge Turns Means Research Objects

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researchers

Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

PI Team

RARE Research Reality Check

RARE Research Reality Check

Tribal Behaviour

raquo Gangs share but not with the public

raquo Tribal behaviours rsaquo Modellers share more than Experimentalists

rsaquo Experimentalists reuse models more than Modellers

raquo Trading behavioursrsaquo Collaboration ndash complementarity

correlations

raquo Structured consortia less likely to publicly share than individuals

raquo Post-hoc rationalised DataModel Cycles

[Garza 2014]

raquo Fluid transient collaborations gt ldquomy gangrdquo management

raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity

raquo Class captains (prefects)

raquo Get the cool kids on board

raquo Head teacher leadership

[Garza 2014]

Playground Rules

Trace Data

27032015 74

me

ME

my team

closecolleagues

peers

The Research Release Creep Spiral

raquo Data Hugging amp Flirting

raquo Reciprocity norms

raquo Hans W request

raquo Dowry phenomenon

raquo Private installations

raquo Private spaces on shared installations

raquo Safe havens

Too ugly to show anyone else

Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it

The code is too sophisticated for most readersrefereesI didnt work out all the details

I didnt actually write the code -- my student did

My competitors would be unfair to me

Its valuable intellectual property

It would make papers much longer

Referees would never agree to check the code

My code invokes other code with unpublished (proprietary) code

Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011

Drivers

love money

fame duty

fear timeeffort

shame duty

[Apologies to Resnick and Malone]

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Interface Framingraquo Limited scheduled sharing choices

rsaquo Never say never

raquo ldquoCitablerdquo not ldquoSharedrdquo

raquo Feedback

rsaquo Guilt tripping

rsaquo Outlier finger pointing

[Garzia]

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipResearch Currencies

ldquoResearchBitCoinrdquo

Citation Semantics

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

Page 59: I conference2015 goble-finalupload

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researchers

Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

PI Team

RARE Research Reality Check

RARE Research Reality Check

Tribal Behaviour

raquo Gangs share but not with the public

raquo Tribal behaviours rsaquo Modellers share more than Experimentalists

rsaquo Experimentalists reuse models more than Modellers

raquo Trading behavioursrsaquo Collaboration ndash complementarity

correlations

raquo Structured consortia less likely to publicly share than individuals

raquo Post-hoc rationalised DataModel Cycles

[Garza 2014]

raquo Fluid transient collaborations gt ldquomy gangrdquo management

raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity

raquo Class captains (prefects)

raquo Get the cool kids on board

raquo Head teacher leadership

[Garza 2014]

Playground Rules

Trace Data

27032015 74

me

ME

my team

closecolleagues

peers

The Research Release Creep Spiral

raquo Data Hugging amp Flirting

raquo Reciprocity norms

raquo Hans W request

raquo Dowry phenomenon

raquo Private installations

raquo Private spaces on shared installations

raquo Safe havens

Too ugly to show anyone else

Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it

The code is too sophisticated for most readersrefereesI didnt work out all the details

I didnt actually write the code -- my student did

My competitors would be unfair to me

Its valuable intellectual property

It would make papers much longer

Referees would never agree to check the code

My code invokes other code with unpublished (proprietary) code

Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011

Drivers

love money

fame duty

fear timeeffort

shame duty

[Apologies to Resnick and Malone]

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Interface Framingraquo Limited scheduled sharing choices

rsaquo Never say never

raquo ldquoCitablerdquo not ldquoSharedrdquo

raquo Feedback

rsaquo Guilt tripping

rsaquo Outlier finger pointing

[Garzia]

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipResearch Currencies

ldquoResearchBitCoinrdquo

Citation Semantics

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

Page 60: I conference2015 goble-finalupload

RARE Research Reality Check

Tribal Behaviour

raquo Gangs share but not with the public

raquo Tribal behaviours rsaquo Modellers share more than Experimentalists

rsaquo Experimentalists reuse models more than Modellers

raquo Trading behavioursrsaquo Collaboration ndash complementarity

correlations

raquo Structured consortia less likely to publicly share than individuals

raquo Post-hoc rationalised DataModel Cycles

[Garza 2014]

raquo Fluid transient collaborations gt ldquomy gangrdquo management

raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity

raquo Class captains (prefects)

raquo Get the cool kids on board

raquo Head teacher leadership

[Garza 2014]

Playground Rules

Trace Data

27032015 74

me

ME

my team

closecolleagues

peers

The Research Release Creep Spiral

raquo Data Hugging amp Flirting

raquo Reciprocity norms

raquo Hans W request

raquo Dowry phenomenon

raquo Private installations

raquo Private spaces on shared installations

raquo Safe havens

Too ugly to show anyone else

Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it

The code is too sophisticated for most readersrefereesI didnt work out all the details

I didnt actually write the code -- my student did

My competitors would be unfair to me

Its valuable intellectual property

It would make papers much longer

Referees would never agree to check the code

My code invokes other code with unpublished (proprietary) code

Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011

Drivers

love money

fame duty

fear timeeffort

shame duty

[Apologies to Resnick and Malone]

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Interface Framingraquo Limited scheduled sharing choices

rsaquo Never say never

raquo ldquoCitablerdquo not ldquoSharedrdquo

raquo Feedback

rsaquo Guilt tripping

rsaquo Outlier finger pointing

[Garzia]

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipResearch Currencies

ldquoResearchBitCoinrdquo

Citation Semantics

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

Page 61: I conference2015 goble-finalupload

Tribal Behaviour

raquo Gangs share but not with the public

raquo Tribal behaviours rsaquo Modellers share more than Experimentalists

rsaquo Experimentalists reuse models more than Modellers

raquo Trading behavioursrsaquo Collaboration ndash complementarity

correlations

raquo Structured consortia less likely to publicly share than individuals

raquo Post-hoc rationalised DataModel Cycles

[Garza 2014]

raquo Fluid transient collaborations gt ldquomy gangrdquo management

raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity

raquo Class captains (prefects)

raquo Get the cool kids on board

raquo Head teacher leadership

[Garza 2014]

Playground Rules

Trace Data

27032015 74

me

ME

my team

closecolleagues

peers

The Research Release Creep Spiral

raquo Data Hugging amp Flirting

raquo Reciprocity norms

raquo Hans W request

raquo Dowry phenomenon

raquo Private installations

raquo Private spaces on shared installations

raquo Safe havens

Too ugly to show anyone else

Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it

The code is too sophisticated for most readersrefereesI didnt work out all the details

I didnt actually write the code -- my student did

My competitors would be unfair to me

Its valuable intellectual property

It would make papers much longer

Referees would never agree to check the code

My code invokes other code with unpublished (proprietary) code

Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011

Drivers

love money

fame duty

fear timeeffort

shame duty

[Apologies to Resnick and Malone]

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Interface Framingraquo Limited scheduled sharing choices

rsaquo Never say never

raquo ldquoCitablerdquo not ldquoSharedrdquo

raquo Feedback

rsaquo Guilt tripping

rsaquo Outlier finger pointing

[Garzia]

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipResearch Currencies

ldquoResearchBitCoinrdquo

Citation Semantics

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

Page 62: I conference2015 goble-finalupload

raquo Fluid transient collaborations gt ldquomy gangrdquo management

raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity

raquo Class captains (prefects)

raquo Get the cool kids on board

raquo Head teacher leadership

[Garza 2014]

Playground Rules

Trace Data

27032015 74

me

ME

my team

closecolleagues

peers

The Research Release Creep Spiral

raquo Data Hugging amp Flirting

raquo Reciprocity norms

raquo Hans W request

raquo Dowry phenomenon

raquo Private installations

raquo Private spaces on shared installations

raquo Safe havens

Too ugly to show anyone else

Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it

The code is too sophisticated for most readersrefereesI didnt work out all the details

I didnt actually write the code -- my student did

My competitors would be unfair to me

Its valuable intellectual property

It would make papers much longer

Referees would never agree to check the code

My code invokes other code with unpublished (proprietary) code

Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011

Drivers

love money

fame duty

fear timeeffort

shame duty

[Apologies to Resnick and Malone]

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Interface Framingraquo Limited scheduled sharing choices

rsaquo Never say never

raquo ldquoCitablerdquo not ldquoSharedrdquo

raquo Feedback

rsaquo Guilt tripping

rsaquo Outlier finger pointing

[Garzia]

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipResearch Currencies

ldquoResearchBitCoinrdquo

Citation Semantics

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

Page 63: I conference2015 goble-finalupload

Trace Data

27032015 74

me

ME

my team

closecolleagues

peers

The Research Release Creep Spiral

raquo Data Hugging amp Flirting

raquo Reciprocity norms

raquo Hans W request

raquo Dowry phenomenon

raquo Private installations

raquo Private spaces on shared installations

raquo Safe havens

Too ugly to show anyone else

Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it

The code is too sophisticated for most readersrefereesI didnt work out all the details

I didnt actually write the code -- my student did

My competitors would be unfair to me

Its valuable intellectual property

It would make papers much longer

Referees would never agree to check the code

My code invokes other code with unpublished (proprietary) code

Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011

Drivers

love money

fame duty

fear timeeffort

shame duty

[Apologies to Resnick and Malone]

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Interface Framingraquo Limited scheduled sharing choices

rsaquo Never say never

raquo ldquoCitablerdquo not ldquoSharedrdquo

raquo Feedback

rsaquo Guilt tripping

rsaquo Outlier finger pointing

[Garzia]

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipResearch Currencies

ldquoResearchBitCoinrdquo

Citation Semantics

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

Page 64: I conference2015 goble-finalupload

me

ME

my team

closecolleagues

peers

The Research Release Creep Spiral

raquo Data Hugging amp Flirting

raquo Reciprocity norms

raquo Hans W request

raquo Dowry phenomenon

raquo Private installations

raquo Private spaces on shared installations

raquo Safe havens

Too ugly to show anyone else

Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it

The code is too sophisticated for most readersrefereesI didnt work out all the details

I didnt actually write the code -- my student did

My competitors would be unfair to me

Its valuable intellectual property

It would make papers much longer

Referees would never agree to check the code

My code invokes other code with unpublished (proprietary) code

Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011

Drivers

love money

fame duty

fear timeeffort

shame duty

[Apologies to Resnick and Malone]

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Interface Framingraquo Limited scheduled sharing choices

rsaquo Never say never

raquo ldquoCitablerdquo not ldquoSharedrdquo

raquo Feedback

rsaquo Guilt tripping

rsaquo Outlier finger pointing

[Garzia]

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipResearch Currencies

ldquoResearchBitCoinrdquo

Citation Semantics

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

Page 65: I conference2015 goble-finalupload

Too ugly to show anyone else

Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it

The code is too sophisticated for most readersrefereesI didnt work out all the details

I didnt actually write the code -- my student did

My competitors would be unfair to me

Its valuable intellectual property

It would make papers much longer

Referees would never agree to check the code

My code invokes other code with unpublished (proprietary) code

Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011

Drivers

love money

fame duty

fear timeeffort

shame duty

[Apologies to Resnick and Malone]

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Interface Framingraquo Limited scheduled sharing choices

rsaquo Never say never

raquo ldquoCitablerdquo not ldquoSharedrdquo

raquo Feedback

rsaquo Guilt tripping

rsaquo Outlier finger pointing

[Garzia]

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipResearch Currencies

ldquoResearchBitCoinrdquo

Citation Semantics

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

Page 66: I conference2015 goble-finalupload

Drivers

love money

fame duty

fear timeeffort

shame duty

[Apologies to Resnick and Malone]

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Interface Framingraquo Limited scheduled sharing choices

rsaquo Never say never

raquo ldquoCitablerdquo not ldquoSharedrdquo

raquo Feedback

rsaquo Guilt tripping

rsaquo Outlier finger pointing

[Garzia]

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipResearch Currencies

ldquoResearchBitCoinrdquo

Citation Semantics

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

Page 67: I conference2015 goble-finalupload

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Interface Framingraquo Limited scheduled sharing choices

rsaquo Never say never

raquo ldquoCitablerdquo not ldquoSharedrdquo

raquo Feedback

rsaquo Guilt tripping

rsaquo Outlier finger pointing

[Garzia]

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipResearch Currencies

ldquoResearchBitCoinrdquo

Citation Semantics

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

Page 68: I conference2015 goble-finalupload

Interface Framingraquo Limited scheduled sharing choices

rsaquo Never say never

raquo ldquoCitablerdquo not ldquoSharedrdquo

raquo Feedback

rsaquo Guilt tripping

rsaquo Outlier finger pointing

[Garzia]

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipResearch Currencies

ldquoResearchBitCoinrdquo

Citation Semantics

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

Page 69: I conference2015 goble-finalupload

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipResearch Currencies

ldquoResearchBitCoinrdquo

Citation Semantics

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

Page 70: I conference2015 goble-finalupload

Credit ne AuthorshipResearch Currencies

ldquoResearchBitCoinrdquo

Citation Semantics

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

Page 71: I conference2015 goble-finalupload

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

Page 72: I conference2015 goble-finalupload

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

Page 73: I conference2015 goble-finalupload

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

Page 74: I conference2015 goble-finalupload

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

Page 75: I conference2015 goble-finalupload

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

Page 76: I conference2015 goble-finalupload

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

Page 77: I conference2015 goble-finalupload

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble