Top Banner
eol.org @eol @cydpar r How the Encyclopedia of Life is wrangling organismal attribute data
15

How the Encyclopedia of Life is wrangling organismal attribute data

Sep 01, 2014

Download

Technology

Cyndy Parr

Lightning talk presented at iEvoBio 2013 in Snowbird, Utah
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: How the Encyclopedia of Life is wrangling organismal attribute data

eol.org@eol@cydparr

How the Encyclopedia of Life is wrangling organismal attribute data

Page 2: How the Encyclopedia of Life is wrangling organismal attribute data

How EOL works

EOL

Crowds

Harvest

Third party applications

Page 3: How the Encyclopedia of Life is wrangling organismal attribute data

EOL Today

Key Milestones in 2013

1.1 million species pages

240+ content providers

3.3 million unique annual visitors from 235 countries

Page 4: How the Encyclopedia of Life is wrangling organismal attribute data

DistributionMolecularBiology

Multiple topicsTypeInformation

HabitatConservationStatus

ThreatsMorphology

ConservationManagement

TrendsSize

AssociationsUses

TrophicStrategyCyclicity & Life Cycle

PopulationBiologyReproduction

MigrationTaxonomy

LifeExpectancyIdentification

BehaviourEcology

Diseases

0 100000 200000 300000 400000 500000 600000 700000 800000

Number of text objectsSu

bjec

t of t

ext o

bjec

t

Page 5: How the Encyclopedia of Life is wrangling organismal attribute data

Text mining, crowdsourcing, standardizing see http://eol.org/info/fellows

Co-occurrence, term extraction & linked data

Thessen & Devries

EnvO habitat terms Pafilis et al.Altitude Specificity of Flower Coloration

Wright

Morphological impacts of extinction risk in fish

Chang

Butterfly-hostplant associations Ferrer-Parris et al.

Species Interactions Poelen & Mungall et al.

Page 6: How the Encyclopedia of Life is wrangling organismal attribute data

14 datasets containing 25k taxa, 422k interactions, for 3k locations

alpha version of ingestion, normalization, aggregation

alpha version of web APIalpha version of data

exports

Dr. Katy Börner ledInformation Visualization MOOC

GLoBI http://globalbioticinteractions.wordpress.com/

Page 7: How the Encyclopedia of Life is wrangling organismal attribute data

EOL TraitBank

Funded: Marine focus

Virtuoso triple store, re-using URIs where possible5 datasets 128,050 data points for 20,896 taxa

Harvest and display on data tabDownloads, fancy searchingMachine access

Page 8: How the Encyclopedia of Life is wrangling organismal attribute data
Page 9: How the Encyclopedia of Life is wrangling organismal attribute data
Page 10: How the Encyclopedia of Life is wrangling organismal attribute data

Uploads & harvests will be by spreadsheetand Darwin Core Archive

Support for annotation and curation

Please contact me to be part of the private beta

Page 11: How the Encyclopedia of Life is wrangling organismal attribute data

Easy access to analyzable trait data

“Are blue organisms more common in high altitudes?”

“Does the evolution of mammalian bacula appear to be related to the pattern of promiscuous mating?”

“What organisms should I collect to fill in gaps in genome quality tissue collections?”

• Look for trait, download for all taxa• Create a collection of taxa, download all data• Use Reol: an R interface to EOL (Banbury, O’Meara) http://reolblog.wordpress.com/• Find more specialized data repositories

Page 12: How the Encyclopedia of Life is wrangling organismal attribute data

But also . . .

Page 13: How the Encyclopedia of Life is wrangling organismal attribute data

ThanksFunding & other contributionsSloan FoundationSmithsonian InstitutionDavid RubensteinMarine Biological LaboratoryHarvard UniversityOur content partnersThousands of individual contributors, and hundreds of volunteer curators

Image credits

Jenny from Taipei

Cynthia ParrChief Scientist @eol

@cydparr [email protected]

Alexandria Archive: Sarah Kansa, Eric Kansa, 34 other zooarchaeologists

GLoBI: Jorrit Poelen (lead/software), Chris Mungall (ontologies), James Simons (biologist) and Robert Reiz (software). Datasets shared by: Peter D. Roopnarine, Rachel Hertog, Carlos García-Robledo, James Simons, Jenny L. Wrast, C. Barnes, International Council for the Exploration of the Sea (ICES), Jose R. Ferrer Paris, Senol Akin, Malcolm Storey (BioInfo.org.uk), Ivy E. Baremore, Joel Sachs (SPIRE), Colt W. Cook, David A. Blewett

Page 14: How the Encyclopedia of Life is wrangling organismal attribute data

Quick math

In Phenoscape57 publications had 565,158 anatomical trait descriptions for 2,527 kinds of organisms= 223 traits/organism

In ZFIN 38,189 trait descriptions for 4,727 genes for Zebra Fish

1.9 million species on the planet

= LOTS OF TRAITS

Page 15: How the Encyclopedia of Life is wrangling organismal attribute data

Anatolia Zooarchaeology Case Study led by Alexandria Archive Institute1. 14 different sites2. 34+ zooarchaeologists3. Decoding, cleanup, metadata documentation4. 220,000+ specimens5. 450 entities linked to 143 EOL taxon concepts6. Anatomical entities linked to Uberon.org7. Biometrics linked to measurement ontology 8. Collaborative analysis

http://opencontext.org/