ENA – 1 st Dec 2014 – EBI, UK Evangelos Pafilis Institute of Marine Biology, Biotechnology and Aquaculture (IMBBC) Hellenic Centre for Marine Research (HCMR), Heraklio Crete, Greece [email protected], http://epafilis.info Text Mining and Environmental Metadata Suggestion
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
ENA – 1st Dec 2014 – EBI, UK
Evangelos Pafilis
Institute of Marine Biology, Biotechnology and Aquaculture (IMBBC)
Hellenic Centre for Marine Research (HCMR), Heraklio Crete, Greece
Microbes are key players in both healthy and degraded coral reefs. A combination of metagenomics, microscopy, culturing, and water chemistry were used to characterize microbial communities on four coral atolls in the Northern Line Islands, central Pacific.
Microbes are key players in both healthy and degraded coral reefs. A combination of metagenomics, microscopy, culturing, and water chemistry were used to characterize microbial communities on four coral atolls in the Northern Line Islands, central Pacific.
Microbes are key players in both healthy and degraded coral reefs. A combination of metagenomics, microscopy, culturing, and water chemistry were used to characterize microbial communities on four coral atolls in the Northern Line Islands, central Pacific.
Microbes are key players in both healthy and degraded coral reefs. A combination of metagenomics, microscopy, culturing, and water chemistry were used to characterize microbial communities on four coral atolls in the Northern Line Islands, central Pacific.
Pafilis E et al. (2013) The SPECIES and ORGANISMS Resources for Fast and Accurate Identification of Taxonomic Names in Text. PLoS ONE 8(6): e65390, *: based a single-thread run on an Intel 2,27GHz, 24 GB RAM processing a set of 536,052 abstracts
ENA – 1st Dec 2014 – EBI, UK
biome
environmental feature
environmental material
environmental condition
habitat … … … … …
Based on slides by Dr. Pier Luigi Buttigier, AWI, Bremenhaven, Germany
http://environmentontology.org ~1600 terms, June 2013
ENVO: source of environment descriptor names and synonyms
ENA – 1st Dec 2014 – EBI, UK
ENVIRONMENTS – Improving Accuracy
● Increasing matches in text ● orthographic variation supported
e.g. freshwater, fresh water, and fresh-water ● Case-insensitive matching ● Synonym generation to reflect the way environment descriptive
terms are mentioned in text (both generic and ENVO specific)
Tags corresponding to “Habitat” text data object: http://eol.org/data_objects/31415353 of EOL Taxon Phoenicopterus ruber (Greater Flamingo): http://eol.org/pages/913221
Tags corresponding to “Habitat” text data object: http://eol.org/data_objects/31415353 of EOL Taxon Phoenicopterus ruber (Greater Flamingo): http://eol.org/pages/913221
Traversing all IS_A, PART_OF
Relationships in ENVO
ENA – 1st Dec 2014 – EBI, UK
Download
ENVIRONMENTS
• Home Page: http://environments.hcmr.gr/ • Tagger Software:
! Importance of standardized metadata and annotations ! ENVO: Standardized hierarchically organized descriptions of
environment types ! Literature, project and other scientific content web pages may
describe the environment context of a metagenomics sample ! ENVIRONMENTS:
! Dictionary-based environment descriptive term identification ! Ontological Community standards, e.g. ENVO: name source ! Command line application
! Browser extensions, a user-friendly interface ! Highly Interactive ! Can be used while browsing the web ! Extract ENVO from a selected part of a web page ! Extended for:
! Organism, diseases, and tissue mention identification
Summary
ENA – 1st Dec 2014 – EBI, UK
Digging-out Information
http://hartpurylrc.files.wordpress.com Photo by Dr Chatzinikolaou E
ENA – 1st Dec 2014 – EBI, UK
Critical Assessment of Information Extraction in Biology
BioCreative: Metagenomics Track
• Preparing a Metagenomics Track as part of the BioCreative 2015 challenge • Aim: improve the environmental-context annotation of sequences in major
metagenomics repositories.
• Track coordinator: Dr. L. Hirschman, MITRE • BioCreative (www.biocreative.org)
ENA – 1st Dec 2014 – EBI, UK
ACTION ES1103
ENVIRONMENTS-EOL http://environments-eol.blogspot.com/ Encyclopedia of Life (EOL) http://www.eol.org • process EOL taxon pages • extract environmental context (ENVO terms) • EOL Taxon Page: Quick Facts, Data tab • integrated in Traitbank • large scale biological questions Rubenstein Fellowship 2013 In collab: Jennifer Hammock, Patrick Leary, Katja Schulz, Cyndy Parr
SEQenv http://environments.hcmr.gr/seqenv.html • annotate microbial sequences with ENVO terms • sequence analysis, literature mining, visualization • GenBank isolation source, PubMed Abstracts • sample comparison, temporal/spatial pattern analysis • extension: proteins, protein families, 3D visualization Reused: Analysis of America bird habitats, http://blog.eol.org/
(NoPlaceLikeHome, in collab: Rob Stevenson, Carl Nordman)
Santos A et al. (under review), preprint: http://biorxiv.org/content/early/2014/11/10/010975
Frankild S et al. (under review), preprint: http://biorxiv.org/content/early/2014/08/25/008425
Pafilis E et al. (2013) The SPECIES and ORGANISMS Resources for Fast and Accurate Identification of Taxonomic Names in Text. PLoS ONE 8(6): e65390
ENA – 1st Dec 2014 – EBI, UK
Acknowledgements
HCMR-IMBG: Christos Arvanitidis, Christina Pavloudi, Katerina Vasileiadou Lucia Fanini, Sarah Faulwetter, Anastasis Oulas NNF CPR: Lars Juhl Jensen, Sune Frankild U Mass: Rob Stevenson Uni Glasgow: Christopher Quince, Umer Ijaz EOL: Cynthia Parr, Jennifer Hammock, Patrick Leary, Katja Schulz MM-MPI: J. Schnetzer, AWI: Dr P. Buttigieg, HITS: Dr. S. Berger and more
HCMR-IMBG: Christos Arvanitidis, Christina Pavloudi, Katerina Vasileiadou Lucia Fanini, Sarah Faulwetter, Anastasis Oulas NNF CPR: Lars Juhl Jensen, Sune Frankild U Mass: Rob Stevenson Uni Glasgow: Christopher Quince, Umer Ijaz EOL: Cynthia Parr, Jennifer Hammock, Patrick Leary, Katja Schulz MM-MPI: J. Schnetzer, AWI: Dr P. Buttigieg, and more