Top Banner
Rise of the machines Rutger A. Vos, Hilmar Lapp, William H. Piel, Val Tannen
16

TreeBASE2: Rise of the Machines

May 10, 2015

Download

Education

Rutger Vos

TreeBASE is a public repository of peer-reviewed phylogenetic knowledge . Researchers submit their results to TreeBASE when they are writing a manuscript based on them for publication in a suitable journal. The submitted data are assigned permanent, unique identifiers and web addresses that authors can refer to in their article. Anyone can locate and access the data once the study has been published by TreeBASE and by the targeted journal.

A prototype of this system has served the phylogenetics community well for a number of years, accumulating the results of thousands of studies. The usage model was that of a silo where data could only be accessed through a web browser, and only be downloaded in representations that omitted important associated metadata. A human with considerable expertise needed to read and interpret the web pages through which everything was served up to make sense out of what was available.

This model is not always practical. For example, phyloinformatic research often uses so much data that automation is becoming necessary. Where human intervention is no longer feasible, machines – which are stupid – must be able to do the job instead; and they need to be told what is what. This has spurred more explicit standardization of the syntax and semantics of phylogenetic knowledge. The latest version of TreeBASE facilitates this by adopting a collection of community standards:

• PhyloWS for automated searching using a contextual query language and retrieval using a clearly defined URL API.

• NeXML for robust data syntax and flexible metadata annotation.

• CDAO (and other ontologies) for defining the semantics of the metadata.

We will present an overview of how these components work together to make phylogenetic knowledge accessible to machines on the semantic web. Using this new architecture, client side software (including off-the-shelve tools such as RSS readers) can query, transform and download TreeBASE data autonomously.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: TreeBASE2: Rise of the Machines

Rise of the machinesRutger A. Vos, Hilmar Lapp, William H.

Piel, Val Tannen

Page 2: TreeBASE2: Rise of the Machines

What is TreeBASE?

A repository of user-submitted phylogenies and source data.

Accepts all types of comparative data for all taxa. Data are public once published in a peer-reviewed medium.

Data in preparation are available to the editors or reviewers using a special access code.

Page 3: TreeBASE2: Rise of the Machines

Web app 

Page 4: TreeBASE2: Rise of the Machines

The machine-readable web

Locations on the web are increasingly visited by machines

instead of human eyes.

Programmable interfaces with structured return values

Page 5: TreeBASE2: Rise of the Machines

The TreeBASE web APIObjects can be found using

CQL

Permanent, simple, URLs

Every object a resolvable resource

Serialized in various formats

Page 6: TreeBASE2: Rise of the Machines

Searching using CQLContextual Query Language –

standard for queries to information retrieval systems

Hides database schema

Instead, search on predicates

Search results as RSS

Page 7: TreeBASE2: Rise of the Machines

PhyloWS Resource URI

PURL domain

Phylogenetics

TreeBASE

PhyloWS

Object ID

http://purl.org/phylo/treebase/phylows/study/TB2:S1787

Page 8: TreeBASE2: Rise of the Machines

Same data, different formats

?format=NEXUSFlat file standard for

phylogenetics

?format=NeXMLXML redesign of NEXUS

?format=RDFCDAO/RDF mapping of NeXML

?format=HTMLWeb page describing the

resource

?format=RSS1RSS1.0 feed for search results

?format=NEXUSFlat file standard for

phylogenetics

?format=NeXMLXML redesign of NEXUS

?format=RDFCDAO/RDF mapping of NeXML

?format=HTMLWeb page describing the

resource

?format=RSS1RSS1.0 feed for search results

Page 9: TreeBASE2: Rise of the Machines

Data and metadataTreeBASE holds a lot of metadata, for example:

•Lat/long coordinates for specimen samples•Literature metadata•Identifiers

Using the newer serialization formats (NeXML and RDF) we can embed all of them using predicates from a variety of ontologies.

Page 10: TreeBASE2: Rise of the Machines

External links

TaxonTaxon

Taxonvariant

Taxonvariant

StudyStudy

Page 11: TreeBASE2: Rise of the Machines

Example: Journal feedsprism.publicationName==Evolution

Page 12: TreeBASE2: Rise of the Machines

Example: UniProt sequences

TreeBASE stores NCBI taxonomy identifiers

Standard tools can

rewrite these linkout URLs

Result is a corresponding list of UniProt

records

Page 13: TreeBASE2: Rise of the Machines

Example: ToLWeb pages

TreeBASE maps to uBio using skos:closeMatch...

…and uBio to ToL using gla:mapping

Page 14: TreeBASE2: Rise of the Machines

Example: geocoding

TreeBASE uses DarwinCore for lat/lon annotations

Page 15: TreeBASE2: Rise of the Machines

What's next?Make TreeBASE LinkedData

compliant

Make TreeBASE extensible with additional annotations using external triple store

Page 16: TreeBASE2: Rise of the Machines

Acknowledgements