Top Banner
Research Objects for improved sharing and reproducibility Dagstuhl Perspective Workshop on the intersection between Computer Sciences and Psychology Oscar Corcho @ocorcho, http://slideshare.net/ocorcho Ontology Engineering Group Universidad Politécnica de Madrid (and the Research Object community group)
47

Research Objects for improved sharing and reproducibility

Jan 28, 2018

Download

Science

Oscar Corcho
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Research Objects for improved sharing and reproducibility

Research Objects for improved

sharing and reproducibility

Dagstuhl Perspective Workshop on the intersection

between Computer Sciences and Psychology

Oscar Corcho

@ocorcho, http://slideshare.net/ocorcho

Ontology Engineering Group

Universidad Politécnica de Madrid

(and the Research Object community group)

Page 2: Research Objects for improved sharing and reproducibility

My motivation

2

Page 3: Research Objects for improved sharing and reproducibility

Some memos from our futuristic scenario

• Don’t publish,

release (ack: Carole

Goble), reloaded

(ack. Paul Groth)

• Don’t just read a

paper, but also view

it, play with it, and

whatever else

• Convert passive

papers into active

scientific storytellers

and alert systems

3

Page 4: Research Objects for improved sharing and reproducibility

A few quotes from this week

• Data (and method) sharing

• Dietrich: The method for investigation is not clearly

described

• Eric: Provide links between articles and datasets

(interlinking of scholarly content)

• William: methods are normally reduced to a tiny

piece of text

• Reproducibility

• Working group on “the present”: Crisis of

replicability is driving increased concern and

interest

• Eric: 70% of science articles are not reproducible

4

Page 5: Research Objects for improved sharing and reproducibility

Act 1

Data and

method sharing5

Page 6: Research Objects for improved sharing and reproducibility

One of the many origins of “Don’t Publish, Release”

• A day in Granada… (January, 2012)

• Let’s get some of the interesting discussions on the Force11

Dagstuhl meeting into practice

6

Page 7: Research Objects for improved sharing and reproducibility

ScientistLive RO Live RO

RO snapshot

<<copy>>

Identified by a URI

Some metadata

Some curation

Mostly private (for my group)

RO snapshot

<<copy>>

Identified by a URI

Some metadata

Some curation

Mostly private (for my group

and for paper reviewers)

Librarian/Curator

Scientist

My supervisor calls

me to report my work

My supervisor calls

me again and we

decide to publish our

RO+paper

<<versionOf>>

Archived RO

<<copy, filterand curate>>

Identified by a URI

Good metadata

and curation

Mostly public

Reviews

received and

final version

published

<<versionOf>>

A new PhD

student

continues my

work

<<copy>>

One of the origins of “Don’t Publish, Release”

Page 8: Research Objects for improved sharing and reproducibility

How do you usually structure your experiment?

• In a set of folders?

• These could be profiles for how you normally

structure your research

• Dropbox? Google Drive? GitHub?

• Overleaf+figshare? Whatever???8

Page 9: Research Objects for improved sharing and reproducibility

Scattered Assets

Page 10: Research Objects for improved sharing and reproducibility

Multi-various products, platforms, resources

First class citizens - id, manage, credit, track, profile, focus

A Framework to Bundle, Port and Link (scattered) resources, related experiments. Metadata Objects that carry Research Context. Units of exchange.

Research Objects

http://www.researchobject.org

Page 11: Research Objects for improved sharing and reproducibility

Identity

Aggregation

Interpretation:

The objects

How they are linked together

RO main principles

manifest

Refer to aggregations and their contents

Describe group & constituents

External ids

Local filesAttribution:

Who , when, where, why?

Metadata

Description

Page 12: Research Objects for improved sharing and reproducibility

Aggregations

Resource maps

Proxies

Annotation first class and stand-off

Identity persistence and resolution, Names

Citation

Identity

Annotation

Aggregation

DOIs

URIsHandles

ORCID

W3C

OADMOAI-

ORE

manifestPoint of extendability

RO main principles: technologies

Page 13: Research Objects for improved sharing and reproducibility

RO Model Ontology

• Defines core concepts of research objects, identity, aggregation, annotation. Used in the manifest

• http://w3id.org/ro/

14

Page 14: Research Objects for improved sharing and reproducibility

Manifest – remote and local

on my machine

Page 15: Research Objects for improved sharing and reproducibility

Export, archive, publish and transfer ROs.

File format for storage and distribution of ROs as a ZIP archive

Includes an RO’s manifest, annotations and some or all of its aggregated resources

Basis for more specific file formats

Backwards compatible: its zip

Programmatic access: JSON and JSON-LD manifest, API

https://researchobject.github.io/specifications/bundle/

https://w3id.org/bundle/ doi:10.5281/zenodo.10440

Page 16: Research Objects for improved sharing and reproducibility

https://researchobject.github.io/specifications/bundle/

https://w3id.org/bundle/ doi:10.5281/zenodo.10440

Page 17: Research Objects for improved sharing and reproducibility

Containers

19

Page 18: Research Objects for improved sharing and reproducibility

Research Objects: Scopes and Tooling

• http://www.researchobject.org/scopes/

• Farr Commons: http://www.farrcommons.org/

• ISA and FAIR-DOM http://fair-dom.org/

• SEEK http://seek4science.org/

• COMBINE

• BagIt (soon)

• White-labelled sci-domain-independent software

• http://rohub.linkeddata.es/

• http://www.rohub.org/

• http://www.researchobject.org/specifications/

• Core Ontologies and extensions

• RO managers/APIs/bundling (Ruby, Java, Python)

• Latex2RO

• LDP4RO

20

Page 19: Research Objects for improved sharing and reproducibility

Publishing may be as easy as…

• Providing the URL

of the Research

Object to the

publisher, with a

release tag, to start

the review process

(if extra review

needed)

21

Page 20: Research Objects for improved sharing and reproducibility

Act 2

Reproducibility

22

Page 21: Research Objects for improved sharing and reproducibility

Terminology

23

Inspired by [Goble, 2012]

Page 22: Research Objects for improved sharing and reproducibility

Terminology

24

Inspired by [Goble, 2012]

Page 23: Research Objects for improved sharing and reproducibility

Terminology

25

Inspired by [Goble, 2012]

Page 24: Research Objects for improved sharing and reproducibility

Terminology

26

Inspired by [Goble, 2012]

Page 25: Research Objects for improved sharing and reproducibility

Terminology

27

Inspired by [Goble, 2012]

Page 26: Research Objects for improved sharing and reproducibility

The Research Method in different disciplines

28

INPUT DATA SCIENTIFIC PROCEDURE EQUIPMENT

IN V

IVO

/VIT

RO

IN S

ILIC

O

Page 27: Research Objects for improved sharing and reproducibility

29

The Research Method in different disciplines

Lab book

Digital Log

Laboratory Protocol (recipe)

Workflow

Experiment

Page 28: Research Objects for improved sharing and reproducibility

The Research Method in different disciplines

30

INPUT DATA SCIENTIFIC PROCEDURE EQUIPMENT

IN V

IVO

/VIT

RO

IN S

ILIC

O

Page 29: Research Objects for improved sharing and reproducibility

Some problems in lab protocols

some of them present insufficient granularity,

the instructions can be imprecise or ambiguous due to the use of natural language.

• Incubate thecentrifuge tubes in a water bath.

• Incubate the samples for 5 min with gentleshaking.

• Rinse DNA briefly in 1-2 ml of wash.

• Incubate at -20C overnight.

Page 30: Research Objects for improved sharing and reproducibility

Currently…

Semi-structured information

Unstructured information

How to formalize the information from laboratory protocols as a knowledge base?

Ontologies + NLP tools

Page 31: Research Objects for improved sharing and reproducibility

SMART Protocols - document

The Protocol as a document

sp:application of the protocol

sp:advantage of the protocol

sp:limitation of the protocol

sp:provenance of the protocol

sp:purpose of the protocol

sp:introduction section

sp:buffer list

sp:equipment and supplies list

sp:kit list

sp:primer list

sp:reagent list

sp:software list

sp:solution list

sp:materials section

exact:caution

sp:critical step

sp:hint

sp:pause point

sp:storage condition

sp:timing

sp:troubleshooting

sp:methods section

sp:experimental

protocol

iao:document iao:document part

iao:textual entity iao:data set

owl:subClassOf

ro:hasPart

ro:partOf

owl:subClassOf owl:subClassOf owl:subClassOf

ro:hasPart ro:hasPart

ro:hasPart ro:partOf

ro:partOf

ro:partOf

owl:subClassOf owl:subClassOf

exact:alert message

owl:subClassOf

Rhetorical and structural components (e.g. introduction, materials, and methods);

Information like application of the protocol, advantages and limitations, list of reagents, critical steps.

Page 32: Research Objects for improved sharing and reproducibility

SMART Protocols - wf

sp:basic step of

DNA extraction

p-plan:Step

p-plan:Variable

sp:cell disruption

sp:plant tissue

Basic Steps of DNA Extraction

sp:DNA purification

obi:DNA extract

p-plan:hasInputVariable

p-plan:hasOutputVariable

p-plan:hasOutputVariable

owl:subClassOf

sp:digestion

reaction

sp:powdered tissue

owl:subClassOf owl:subClassOf

owl:subClassOf

p-plan:hasInputVariable

sp:digested

contaminant

p-plan:hasInputVariable p-plan:hasOutputVariable

owl:subClassOf owl:subClassOf owl:subClassOf owl:subClassOf

bfo:isPrecededBy bfo:isPrecededBy

Representation of the workflow aspects in protocols

implicit order in the instructions, following the input output structure.

Page 33: Research Objects for improved sharing and reproducibility

SMART Protocols documentation

• SMART Protocols ontology is available here:

• http://vocab.linkeddata.es/SMARTProtocols/

• Giraldo O, García-Castro A, Corcho O. SMART

Protocols: SeMAntic RepresenTation for

Experimental Protocols. LISC2014

Page 34: Research Objects for improved sharing and reproducibility

SMART Protocols in action

sp= smart protocols, ro= relation ontology

sp:experimental

protocol

sp:DNA extraction

protocol

sp:advantages

sp:sample

owl:subClassOf

rdf:type

sp:title of the protocol

sp:author entry

rdf:type

sp:hasAuthorsp:hasTitle

rdf:type

ro:partOf

ro:partOf

sp:applicationof the protocol

ro:partOf

rdf:type

rdf:type

Page 35: Research Objects for improved sharing and reproducibility

SMART Protocols in action

Page 36: Research Objects for improved sharing and reproducibility

The Research Method in different disciplines

38

INPUT DATA SCIENTIFIC PROCEDURE EQUIPMENT

IN V

IVO

/VIT

RO

IN S

ILIC

O

Page 37: Research Objects for improved sharing and reproducibility

Vocabularies and methodologies for representing and publishing workflows

39

Interactive Browsing

(Pubby frontend)

Programatic access(external apps)

Wings workflow generation

OPM/PROVconversion

Publication Share Reuse

Core

Portal

WINGS on local laptop

Workflow Template

WorkflowInstance

PROVexport

Core

Portal

WINGS on shared host

Workflow Template

WorkflowInstance

PROVexport

Core

Portal

WINGS on web server

Workflow Template

WorkflowInstance

PROVexport

LinkedData

Publication

Users

Other

workflow

environments

RDF TripleStore

Workflow Provenance

Workflow PlanMethodology for workflow publishing

Repository of linked workflows:http://www.opmw.org/sparql

http://purl.org/net/p-plan

http://www.opmw.org/ontology/

Daniel Garijo and Yolanda Gil. 2011. A new approach for publishing workflows: abstractions, standards, and linked data. (WORKS '11). ACM, New York, NY, USA, 47-56.

Daniel Garijo and Yolanda Gil. Augmenting PROV with Plans in P-PLAN: Scientific Processes as Linked Data. In Proceedings of the 2nd International Workshop on Linked Science 2012, Boston, 2012.

Page 38: Research Objects for improved sharing and reproducibility

Definition of workflow abstractions

40

Catalog of common independent workflow abstractions (motifs)

Data-oriented motifs: What kind of manipulations does the workflow have?

Workflow-oriented motifs: How does theworkflow perform its operations

Analysis from 260 different workflowsfrom 10 domains analyzed belonging to5 different workflow systems

http://purl.org/net/wf-motifs#

Daniel Garijo, Pinar Alper, Khalid Belhajjame, Oscar Corcho, Yolanda Gil, Carole Goble, Common motifs in scientific workflows: An empirical analysis, Future Generation Computer Systems, Volume 36, July 2014, Pages 338-351

Page 39: Research Objects for improved sharing and reproducibility

Finding and evaluating common abstractions

41

https://github.com/dgarijo/FragFlow

http://purl.org/net/wf-fd

Graph mining techniques

Workflow fragmentrepresentationand linkage

Workflow fragmentFiltering techniques

Daniel Garijo, Oscar Corcho, Yolanda Gil, Boris A.Gutman,Ivo D. Dinov, Paul Thompson, and Arthur W. Toga. FragFlow: Automated Fragment Detection in ScientificWorkflows. In The 10th IEEE International Conference on e-Science, Guaruja, 2014

Page 40: Research Objects for improved sharing and reproducibility

How to preserve Workflows/Research Objects?

42

Three main ways/levels:

•Descriptive reproducibility

•Documentation

•Workflow execution reproducibility

•Can we run the workflow?

•Workflow results reproducibility

•Can we get the same results?

Checklists!

•Corcho et al: Checklist for workflow conservation.

•http://dx.doi.org/10.6084/m9.figshare.1285011

•40 different aspects

•Documentation

•Goals

•Results

•Metadata

•Corcho et al: Checklist for a workflow conservation plan

•http://dx.doi.org/10.6084/m9.figshare.1285012

•Based on the DCC’s data management plan

Page 41: Research Objects for improved sharing and reproducibility

Some examples

43

Levels of reproducibility

Workflow conservation Plan

Page 42: Research Objects for improved sharing and reproducibility

The Research Method in different disciplines

44

INPUT DATA SCIENTIFIC PROCEDURE EQUIPMENT

IN V

IVO

/VIT

RO

IN S

ILIC

O

Page 43: Research Objects for improved sharing and reproducibility

PegasusMontage

SoyKB

Epigenomics

CLOUD

Reproducibility of Computational Scientific Experiments

45

FORMER

EQUIPMENT

ANNOTATE REPRODUCE

SEMANTIC

ANNOTATIONS

EQUIVALENT EXECUTION

ENVIRONMENT

Dispel4PyInternal Extinction

Seismic Cross Correlation

MakeflowBlast

Page 44: Research Objects for improved sharing and reproducibility

Some results

• Pegasus Montage Workflow

• Astronomy workflow

• Construct large image mosaics of the sky

• Montage Software distribution

• 59 binaries

• Target IaaS Cloud Providers

• Amazon EC2 & Futuregrid

• Vagrant

47

RO available at http://pegasus.isi.edu/publications/reppar

Page 45: Research Objects for improved sharing and reproducibility

Lessons learned for Anna

• Research Objects as a

concept

• Identity, annotation,

aggregation

• Adapted to the

tools/infrastructure for each

domain

• With some tooling available

already

• It’s not just data preservation

but also methods

• Lab protocols

• Computational workflows

• Understand what

reproducibility means for you48

Page 46: Research Objects for improved sharing and reproducibility

Research Objects for improved

sharing and reproducibility

Dagstuhl Perspective Workshop on the intersection

between Computer Sciences and Psychology

Oscar Corcho

@ocorcho, http://slideshare.net/ocorcho

Ontology Engineering Group

Universidad Politécnica de Madrid

(and the Research Object community group)

Page 47: Research Objects for improved sharing and reproducibility

Acknowledgements

• The Semantic e-Science team at UPM

• Carlos Badenes

• Daniel Garijo

• Olga Giraldo

• Rafael González-Cabero

• Idafen Santana

• The Wf4Ever team

• Carole Goble, José Manuel Gómez Pérez, Raúl Palma, Jun Zhao, Stian Soiland-Reyes, Khalid Belhajjame, José Enrique Ruíz, Marco Roos, Lourdes Verdes-Montenegro, Norman Morrison, Sean Bechoffer, Graham Klyne, Matt Gamble, and a large etcetera

• The Research Object community group

• http://www.researchobject.org/

50