One small(ish) step for modellers, one giant leap for mankind Capturing the context Mihai Glonț Reproducible and Citable Data and Models Warnemuende September 2015
One small(ish) step for modellers, one giant leap for mankind
Capturing the context
Mihai Glonț
Reproducible and Citable Data and ModelsWarnemuendeSeptember 2015
A simple(?) question
How easy is it to find reusable models? Reusable should entail, at least
– Reproducible
– Friendly licence
– Understandable
Problems
How do we recognise concepts? Is adenosine5PrimePhospate a better variable name than a? Do all modellers know the same amount information about ATP?
How can we uniquely identify the concepts involved in a modelling exercise?
Web 2.0
● Prevalence of content generators● Social media● Rich user interfaces● Folksonomies● Software as a service
Web 3.0
● Semantic Web
● “The Semantic Web provides a common framework that allows data to be shared and reused across application, enterprise, and community boundaries" (W3C)
● Machines understand the data on the web and can reason about it
● Implicit knowledge is captured in a machine-processable manner
● What holiday options are there for a family of four for 10 days, somewhere sunny and close to the sea, with good food and a budget of EUR 3000?
Semantic web overview
● Taxonomies and ontologies define concepts (resources) and ontologies
● Identification through URIs● Data is exchanged as RDF
Ontologies
● Define concepts, instances, attributes and relationships● Workshop is a kind of Thing
● Workshop hasA location
RDF Primer
● Resource Description Framework● Documents consist of a series of statements
● Statements (triples) follow the following syntax● Subject - Predicate – Object
https://sems.uni-rostock.de/reproducible-and-citable-data-and-models/http://example.com/someOntology/hasLocation
https://en.wikipedia.org/wiki/Warnemunde
A selection of ontologies for life scientists
● ChEBI: http://www.ebi.ac.uk/chebi/
● GO: http://geneontology.org/
● BRENDA Tissue Ontology: http://www.brenda-enzymes.org/
● FMA: http://bioportal.bioontology.org/ontologies/FMA
● Human disease ontology: http://disease-ontology.org/
● TEDDY: http://purl.bioontology.org/ontology/TEDDY/
● KiSAO: http://co.mbine.org/standards/kisao
● SBO: http://www.ebi.ac.uk/sbo/
https://www.ebi.ac.uk/ontology-lookup/
Identifiers, identifiers, identifiers
● Is http://purl.uniprot.org/taxonomy/9606
the same as http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Info&id=9606 or
http://taxonomy.bio2rdf.org/describe/?url=http://bio2rdf.org/taxonomy:9606
● What if the URIs change?● What if the URIs don't point to anything?
Introducing identifiers.org
● The aim of the identifiers.org project is to provide unique, stable, resolvable and location-independent URIs to identify and to locate scientific data
● Community-driven
● Free to use
Creating unique URIs
• Homo sapiens in Taxonomy (9606)
http://identifiers.org/taxonomy/9606http://identifiers.org/taxonomy/9606
[Data collection]
[Entity identifier]
Creating resolvable URIs
http://identifiers.org/taxonomy/9606http://identifiers.org/taxonomy/9606
• URI to identify the entity 'Homo sapiens' in the data collection Taxonomy
http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Info&id=9606
http://www.uniprot.org/taxonomy/9606
http://www.ebi.ac.uk/ena/data/view/Taxon:9606
ResourceResource ResourceResource ReferenceReference
Primary
http://info.identifiers.org/taxonomy/9606http://info.identifiers.org/taxonomy/9606
Inter-conversion of identifier schemes• Registry records different identifier schemes
• Web service for inter-conversion between identifier schemes
http://purl.obolibrary.org/obo/GO_0005886
http://purl.obolibrary.org/obo/GO_0005886
http://bio2rdf.org/go:0005886http://bio2rdf.org/go:0005886
http://identifiers.org/go/GO:0005886
http://identifiers.org/go/GO:0005886
Support for different formats
TaxonomyTaxonomy
htmlhtml
htmlhtml
RDFRDF
jsonjson
• The Registry records the formats provided by the various data resources