Top Banner
Rphenoscape: Connecting the semantics of evolutionary morphology to comparative phylogenetics Hilmar Lapp (Duke University) Hong Xu (Duke University) Jim Balhoff (RTI, Inc.) Evolution Meetings 2016, Austin, TX
24

Rphenoscape: Connecting the semantics of evolutionary morphology to comparative phylogenetics

Apr 16, 2017

Download

Science

Hilmar Lapp
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Rphenoscape: Connecting the semantics of evolutionary morphology to comparative phylogenetics

Rphenoscape:Connecting the semantics of evolutionary morphology to comparative phylogenetics

Hilmar Lapp (Duke University) Hong Xu (Duke University)

Jim Balhoff (RTI, Inc.)

Evolution Meetings 2016, Austin, TX

Page 2: Rphenoscape: Connecting the semantics of evolutionary morphology to comparative phylogenetics

RPhenoscape• A package for accessing the

Phenoscape Knowledgebase from within R programs

• Programmatic access to: • Evolutionary character data with

computable semantics • Machine-reasoning with

computable phenotype data

Page 3: Rphenoscape: Connecting the semantics of evolutionary morphology to comparative phylogenetics

R features a rich ecosystem for comparative phylogenetics

CRAN Task View on Phylogenetics and

Comparative Methods at last count lists 76 packages.

Page 4: Rphenoscape: Connecting the semantics of evolutionary morphology to comparative phylogenetics

Comparative analysis needs comparative data

Magee et al (2014), PLOS ONESee also Drew et al (2013), PLOS Biology; Stoltzfus et al (2012), BMC Research Notes

Page 5: Rphenoscape: Connecting the semantics of evolutionary morphology to comparative phylogenetics

The lack of reusable digital data is amplified for morphology

Which matrix are we criticizing?

RC07 published their character list (their appendix 2) and their matrix (appendix 3). We also have a hitherto unpublished NEXUS file (presented here as part of Data S3), most likely sent by [M.R.] to [D.G.] in late 2007 or early 2008, which purports to contain the same matrix. Surprisingly, the character list in the paper and that in the file do not agree on the identities of characters 132–134.

Marjanović and Laurin (2015) Reevaluation of the largest published morphological data matrix for phylogenetic analysis of Paleozoic limbed vertebrates. PeerJ Preprints 3:e1596v1

Page 6: Rphenoscape: Connecting the semantics of evolutionary morphology to comparative phylogenetics

Morphology is more complex than discrete, disjoint, independent

Page 7: Rphenoscape: Connecting the semantics of evolutionary morphology to comparative phylogenetics

We know more about morphology than authors state

Page 8: Rphenoscape: Connecting the semantics of evolutionary morphology to comparative phylogenetics

Implied knowledge can be substantial

Asserted

Inferred

Missing

Digit presence/absence; Sarcopterygii

Page 9: Rphenoscape: Connecting the semantics of evolutionary morphology to comparative phylogenetics

kb.phenoscape.org

Makemorphologycomputable,discoverable,&linkedtogeneticdata

9

Page 10: Rphenoscape: Connecting the semantics of evolutionary morphology to comparative phylogenetics

PhenoscapeKnowledgebase

❖ 4,399 taxa (vertebrates)

❖ 139 publications (matrices)

❖ 19,024 character states

❖ 651,660 phenotype annotations

Morphological matrices

Annotation

Ontologies

anatomy

quality

taxonomy

Phenex software(Balhoff et al., 2010)

Phenoscape Knowledgebase

Machine reasoner (OWL)

Page 11: Rphenoscape: Connecting the semantics of evolutionary morphology to comparative phylogenetics

KB Interface for humans

Page 12: Rphenoscape: Connecting the semantics of evolutionary morphology to comparative phylogenetics

KB Interface for machines

Page 13: Rphenoscape: Connecting the semantics of evolutionary morphology to comparative phylogenetics

KB Interface for machines

Page 14: Rphenoscape: Connecting the semantics of evolutionary morphology to comparative phylogenetics

RPhenoscapeAn R package for API access to the Phenoscape Knowledgebase • Evolutionary character data with computable

semantics • Machine-reasoning with computable

phenotype data: • Synthetic supermatrix synthesis • Semantics-based character and state filtering • Semantic similarity-driven querying and

synthesis

Page 15: Rphenoscape: Connecting the semantics of evolutionary morphology to comparative phylogenetics

Use-case: Querying studies by morphology and taxonomy

> slist <- pk_get_study_list(taxon = "Ictaluridae", entity = "pectoral fin")

> slist[,"label"]

Source: local data frame [10 x 1] label <chr> 1 Bockmann, F. A. (1998) 2 Chen, X. (1994) 3 De Pinna, M. C., Ferraris, C. J. J., & Vari, R. P. (2007) 4 Fink, S. V, & Fink, W. L. (1981); Fink, S. V, & Fink, W. L. (1996) 5 Kailola, P. J. (2004) 6 Lundberg, J. G. (1992) 7 Mo, T. (1991) 8 Royero, R. (1999) 9 Vigliotta, T. R. (2008) 10 Wiley, E.O., and Johnson, G.D. (2010) >

Page 16: Rphenoscape: Connecting the semantics of evolutionary morphology to comparative phylogenetics

Use-case: Querying studies by morphology and taxonomy> nex_list <- pk_get_study_xml(as.matrix (slist[2:3,c("id")])) ....This might take a while.... https://scholar.google.com/scholar?q=hylogenetic+studies+of+the+amblycipitid+catfishes+%28Teleostei%2C+Siluriformes%29+with+species+accounts&btnG=&hl=en&as_sdt=0%2C42 Parse NeXML.... http://dx.doi.org/10.1111/j.1096-3642.2007.00306.x Parse NeXML.... > nex_list[[1]] A nexml object representing: 0 phylogenetic tree blocks, where: block 1 contains NULL phylogenetic trees block 0 contains phylogenetic trees 155 meta elements 1 character matrices 53 taxonomic units Taxa: Pseudobagarius leucorhynchus, Liobagrus obesus, Hypsidoris farsonensis, Erethistes sp. (Chen 1994), Bunocephalus amaurus, Xyliphius sp. (Chen 1994) ...

Page 17: Rphenoscape: Connecting the semantics of evolutionary morphology to comparative phylogenetics

Heavy-lifting of NeXML parsing is done by RNeXML

Page 18: Rphenoscape: Connecting the semantics of evolutionary morphology to comparative phylogenetics

Use-case: Synthesize presence-absence matrix

> nex <- pk_get_ontotrace_xml(taxon = c("Ictalurus", "Ameiurus"), entity = "fin spine") > m <- pk_get_ontotrace(nex) > m[1:10,] ## Source: local data frame [15 x 5] ## ## taxa otu ## (chr) (chr) ## 1 Ameiurus brunneus VTO_0036273 ## 2 Ameiurus catus VTO_0036275 ## 3 Ameiurus melas VTO_0036272 ## 4 Ameiurus natalis VTO_0036274 ## 5 Ameiurus nebulosus VTO_0036278 ## 6 Ameiurus platycephalus VTO_0036276 ## 7 Ameiurus serracanthus VTO_0036277 ## 8 Ictalurus australis VTO_0061495 ## 9 Ictalurus balsanus VTO_0036221 ## 10 Ictalurus dugesii VTO_0061497 ## Variables not shown: otus (chr), anterior dentation of pectoral fin spine ## (int), anterior distal serration of pectoral fin spine (dbl)

Page 19: Rphenoscape: Connecting the semantics of evolutionary morphology to comparative phylogenetics

Use-case: Filter matrix using semantics

> is_desc <- pk_is_descendant('Ictalurus', m$taxa)

## [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE ## [12] TRUE TRUE TRUE TRUE

# pk_is_ancestor() also available (and pk_is_extinct()) # # This is in development for characters, too: # pk_is_descendant(‘jaw skeleton', m$chars, # relationships = ‘part of’)

Page 20: Rphenoscape: Connecting the semantics of evolutionary morphology to comparative phylogenetics

Current limitationsPackage is not on CRAN (yet), need to install from Github:

Data in Phenoscape concentrated on vertebrates, and skeletal fin-limb characters.

Semantics-driven matrix synthesis currently limited to presence-absence characters.

library(“devtools”)install_github(“xu-hong/rphenoscape”)

Page 21: Rphenoscape: Connecting the semantics of evolutionary morphology to comparative phylogenetics

SummaryPhenoscape KB has an API for machine access to computable morphology data and computational semantics services.

RPhenoscape is a bridge between this API and the ecosystem of comparative phylogenetics packages in R.

Translates between R user (who uses labels, data matrices) and Phenoscape KB API (which uses identifiers, ontology terms, NeXML, etc).

Code on Github: https://github.com/xu-hong/rphenoscape

Page 22: Rphenoscape: Connecting the semantics of evolutionary morphology to comparative phylogenetics

Reproducible data integration: The rOpenSci ecosystem

Page 23: Rphenoscape: Connecting the semantics of evolutionary morphology to comparative phylogenetics

AcknowledgementsU.S. National Science Foundation DBI-1062404, DBI-1062542

National Evolutionary Synthesis Center (NESCent), NSF #EF-0905606

Phenoscape contributors, Advisory Board, Data sources (see: http://phenoscape.org/wiki/Acknowledgments)

RNeXML developers (C. Boettiger, S. Chamberlain) http://github.com/rOpenSci/RNeXML

Page 24: Rphenoscape: Connecting the semantics of evolutionary morphology to comparative phylogenetics

Get in touch

Repo: github.com/xu-hong/rphenoscape

Github: github.com/hlapp

Twitter: @hlapp