Research Data Management
Content Mine + Europe PubMedCentralPeter Murray-Rust,
ContentMine.org and UniversityOfCambridgeWellcome Trust, London, UK
2016-02-08
Getpapers[0] and AMI[1]download and analyze papers from
EuropePubMedCentral
[0][1] F/OSS tools from contentmine.org
Hi, Im here to talk about AMI; a data extraction framework and
tool. First, I just want highlight some of key contributors to the
projects; Andy for his work on the ChemistryVisitor and Peter for
the overall architecture.
In this talk, Im going to impress the importance of data in a
specific format and its utility to automated machine processing.
Then Im going to demonstrate AMIs architecture and the
transformation of data as it flows through the process. Im going to
dwell a little on a core format used, Scalable Vector Graphics
(SVG) before introducing the concept of visitors, which are
pluggable context specific data extractors. Next, Im going to
introduce Andys ChemVisitor, for extracting semantic chemistry
data, along with a few other visitors that can process
non-chemistry specific data. Finally, I will demonstrate some uses
of the ChemVisitor, within the realm of validation and
metabolism.
Automated Semantic FulltextEuropePMC provides coherent
OpenAccessgetpapers: wrapper for repos and search engines.AMI
filters, checks[1], transforms facts in papers. Here:Sequences in
textSpecies and generaGenes User dictionaries(RRIDs, chemistry,
places, phylo)
[0] All operations shown run in total of