Rphenoscape: Connecting the semantics of evolutionary morphology to comparative phylogenetics Hilmar Lapp (Duke University) Hong Xu (Duke University) Jim Balhoff (RTI, Inc.) Evolution Meetings 2016, Austin, TX
Rphenoscape:Connecting the semantics of evolutionary morphology to comparative phylogenetics
Hilmar Lapp (Duke University) Hong Xu (Duke University)
Jim Balhoff (RTI, Inc.)
Evolution Meetings 2016, Austin, TX
RPhenoscape• A package for accessing the
Phenoscape Knowledgebase from within R programs
• Programmatic access to: • Evolutionary character data with
computable semantics • Machine-reasoning with
computable phenotype data
R features a rich ecosystem for comparative phylogenetics
CRAN Task View on Phylogenetics and
Comparative Methods at last count lists 76 packages.
Comparative analysis needs comparative data
Magee et al (2014), PLOS ONESee also Drew et al (2013), PLOS Biology; Stoltzfus et al (2012), BMC Research Notes
The lack of reusable digital data is amplified for morphology
Which matrix are we criticizing?
RC07 published their character list (their appendix 2) and their matrix (appendix 3). We also have a hitherto unpublished NEXUS file (presented here as part of Data S3), most likely sent by [M.R.] to [D.G.] in late 2007 or early 2008, which purports to contain the same matrix. Surprisingly, the character list in the paper and that in the file do not agree on the identities of characters 132–134.
Marjanović and Laurin (2015) Reevaluation of the largest published morphological data matrix for phylogenetic analysis of Paleozoic limbed vertebrates. PeerJ Preprints 3:e1596v1
Implied knowledge can be substantial
Asserted
Inferred
Missing
Digit presence/absence; Sarcopterygii
kb.phenoscape.org
Makemorphologycomputable,discoverable,&linkedtogeneticdata
9
PhenoscapeKnowledgebase
❖ 4,399 taxa (vertebrates)
❖ 139 publications (matrices)
❖ 19,024 character states
❖ 651,660 phenotype annotations
Morphological matrices
Annotation
Ontologies
anatomy
quality
taxonomy
Phenex software(Balhoff et al., 2010)
Phenoscape Knowledgebase
Machine reasoner (OWL)
RPhenoscapeAn R package for API access to the Phenoscape Knowledgebase • Evolutionary character data with computable
semantics • Machine-reasoning with computable
phenotype data: • Synthetic supermatrix synthesis • Semantics-based character and state filtering • Semantic similarity-driven querying and
synthesis
Use-case: Querying studies by morphology and taxonomy
> slist <- pk_get_study_list(taxon = "Ictaluridae", entity = "pectoral fin")
> slist[,"label"]
Source: local data frame [10 x 1] label <chr> 1 Bockmann, F. A. (1998) 2 Chen, X. (1994) 3 De Pinna, M. C., Ferraris, C. J. J., & Vari, R. P. (2007) 4 Fink, S. V, & Fink, W. L. (1981); Fink, S. V, & Fink, W. L. (1996) 5 Kailola, P. J. (2004) 6 Lundberg, J. G. (1992) 7 Mo, T. (1991) 8 Royero, R. (1999) 9 Vigliotta, T. R. (2008) 10 Wiley, E.O., and Johnson, G.D. (2010) >
Use-case: Querying studies by morphology and taxonomy> nex_list <- pk_get_study_xml(as.matrix (slist[2:3,c("id")])) ....This might take a while.... https://scholar.google.com/scholar?q=hylogenetic+studies+of+the+amblycipitid+catfishes+%28Teleostei%2C+Siluriformes%29+with+species+accounts&btnG=&hl=en&as_sdt=0%2C42 Parse NeXML.... http://dx.doi.org/10.1111/j.1096-3642.2007.00306.x Parse NeXML.... > nex_list[[1]] A nexml object representing: 0 phylogenetic tree blocks, where: block 1 contains NULL phylogenetic trees block 0 contains phylogenetic trees 155 meta elements 1 character matrices 53 taxonomic units Taxa: Pseudobagarius leucorhynchus, Liobagrus obesus, Hypsidoris farsonensis, Erethistes sp. (Chen 1994), Bunocephalus amaurus, Xyliphius sp. (Chen 1994) ...
Use-case: Synthesize presence-absence matrix
> nex <- pk_get_ontotrace_xml(taxon = c("Ictalurus", "Ameiurus"), entity = "fin spine") > m <- pk_get_ontotrace(nex) > m[1:10,] ## Source: local data frame [15 x 5] ## ## taxa otu ## (chr) (chr) ## 1 Ameiurus brunneus VTO_0036273 ## 2 Ameiurus catus VTO_0036275 ## 3 Ameiurus melas VTO_0036272 ## 4 Ameiurus natalis VTO_0036274 ## 5 Ameiurus nebulosus VTO_0036278 ## 6 Ameiurus platycephalus VTO_0036276 ## 7 Ameiurus serracanthus VTO_0036277 ## 8 Ictalurus australis VTO_0061495 ## 9 Ictalurus balsanus VTO_0036221 ## 10 Ictalurus dugesii VTO_0061497 ## Variables not shown: otus (chr), anterior dentation of pectoral fin spine ## (int), anterior distal serration of pectoral fin spine (dbl)
Use-case: Filter matrix using semantics
> is_desc <- pk_is_descendant('Ictalurus', m$taxa)
## [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE ## [12] TRUE TRUE TRUE TRUE
# pk_is_ancestor() also available (and pk_is_extinct()) # # This is in development for characters, too: # pk_is_descendant(‘jaw skeleton', m$chars, # relationships = ‘part of’)
Current limitationsPackage is not on CRAN (yet), need to install from Github:
Data in Phenoscape concentrated on vertebrates, and skeletal fin-limb characters.
Semantics-driven matrix synthesis currently limited to presence-absence characters.
library(“devtools”)install_github(“xu-hong/rphenoscape”)
SummaryPhenoscape KB has an API for machine access to computable morphology data and computational semantics services.
RPhenoscape is a bridge between this API and the ecosystem of comparative phylogenetics packages in R.
Translates between R user (who uses labels, data matrices) and Phenoscape KB API (which uses identifiers, ontology terms, NeXML, etc).
Code on Github: https://github.com/xu-hong/rphenoscape
AcknowledgementsU.S. National Science Foundation DBI-1062404, DBI-1062542
National Evolutionary Synthesis Center (NESCent), NSF #EF-0905606
Phenoscape contributors, Advisory Board, Data sources (see: http://phenoscape.org/wiki/Acknowledgments)
RNeXML developers (C. Boettiger, S. Chamberlain) http://github.com/rOpenSci/RNeXML
Get in touch
Repo: github.com/xu-hong/rphenoscape
Github: github.com/hlapp
Twitter: @hlapp