Top Banner
Content Mine + Europe PubMedCentral Peter Murray-Rust, ContentMine.org and UniversityOfCambridge Wellcome Trust, London, UK 2016-02-08 Getpapers[0] and AMI[1]download and analyze papers from EuropePubMedCentral [0][1] F/OSS tools from contentmine.org
15

ContentMine + EPMC: Finding Zika!

Feb 11, 2017

Download

Health & Medicine

TheContentMine
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript

Research Data Management

Content Mine + Europe PubMedCentralPeter Murray-Rust, ContentMine.org and UniversityOfCambridgeWellcome Trust, London, UK 2016-02-08

Getpapers[0] and AMI[1]download and analyze papers from EuropePubMedCentral

[0][1] F/OSS tools from contentmine.org

Hi, Im here to talk about AMI; a data extraction framework and tool. First, I just want highlight some of key contributors to the projects; Andy for his work on the ChemistryVisitor and Peter for the overall architecture.

In this talk, Im going to impress the importance of data in a specific format and its utility to automated machine processing. Then Im going to demonstrate AMIs architecture and the transformation of data as it flows through the process. Im going to dwell a little on a core format used, Scalable Vector Graphics (SVG) before introducing the concept of visitors, which are pluggable context specific data extractors. Next, Im going to introduce Andys ChemVisitor, for extracting semantic chemistry data, along with a few other visitors that can process non-chemistry specific data. Finally, I will demonstrate some uses of the ChemVisitor, within the realm of validation and metabolism.

Automated Semantic FulltextEuropePMC provides coherent OpenAccessgetpapers: wrapper for repos and search engines.AMI filters, checks[1], transforms facts in papers. Here:Sequences in textSpecies and generaGenes User dictionaries(RRIDs, chemistry, places, phylo)

[0] All operations shown run in total of