Top Banner
agINFRA A data infrastructure to support agricultural scientific communities Andreas Drakos, University of Alcala EGI-APARSEN workshop, Amsterdam, 4-6 March 2014
25

agINFRA EGI-APARSEN workshop, Amsterdam, 4-6 March 2014

Sep 13, 2014

Download

Technology

Presentation of agINFRA project (www.aginfra.eu) in the EGI-APARSEN workshop, Amsterdam, 4-6 March 2014
“Managing, computing and preserving big data for research”
https://indico.egi.eu/indico/conferenceDisplay.py?confId=2052
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: agINFRA EGI-APARSEN workshop, Amsterdam, 4-6 March 2014

agINFRAA data infrastructure to

support agricultural scientific communities

Andreas Drakos, University of AlcalaEGI-APARSEN workshop, Amsterdam, 4-6 March 2014

Page 2: agINFRA EGI-APARSEN workshop, Amsterdam, 4-6 March 2014

2

Our project

in agINFRA we will:

share agricultural research…

…over a data e-infrastructure

EGI-APARSEN workshop, Amsterdam, 4-6 March 2014

Page 3: agINFRA EGI-APARSEN workshop, Amsterdam, 4-6 March 2014

3

• Primary data:– Structured, e.g. datasets as tables– Digitized : images, videos, etc.

• Secondary data (elaborations, e.g. a dendogram)• Provenance information, incl. authors, their

organizations and projects• Methods and procedures followed• Reports, including papers• Secondary documents, e.g. training resources• Metadata about the above• Social data, tags, ratings, etc.

Agricultural research data

EGI-APARSEN workshop, Amsterdam, 4-6 March 2014

Page 4: agINFRA EGI-APARSEN workshop, Amsterdam, 4-6 March 2014

4

A | Open | Must be open and interlinkedNOT subject to barriers, based on standard formats and avoiding building data silos due to lack of interrelatedness and ad-hoc APIs.

B | Meaningful | Must be meaningful through explicit semantics

C | Reliable | Must be reliable, traceable and accessible

D | Actionable | Must be actionable via services that empower research

Reusing the semantics already provided in mature terminologies and ontologies that are exposed and interlinked through the Web.

Any kind of research objects can be stored in the data infrastructure, and there are NO barriers to expressing relations between these objects to capture the context of research activities.

Data is not useful without flexible and adaptable services that allow researchers to act on the data in the ways they need.

agINFRA values: scientific data must be

EGI-APARSEN workshop, Amsterdam, 4-6 March 2014

Page 5: agINFRA EGI-APARSEN workshop, Amsterdam, 4-6 March 2014

5

There is a lot of data

EGI-APARSEN workshop, Amsterdam, 4-6 March 2014

Page 6: agINFRA EGI-APARSEN workshop, Amsterdam, 4-6 March 2014

6

CONTENT PROVIDER WITH CMS THAT SUPPORTS SHARING (e.g. OAI-PMH, RSS,...)

CONTENT PROVIDER WITH CMS THAT DOES NOT SUPPORT SHARING (e.g. proprietary DB)

(meta)data export in proprietary

format & mapping to known

ingestion in sharing compliant tool

CONTENT PROVIDER WITH UNORGANISED COLLECTION(e.g. listed at Web site or in DVD-ROM)

chooses sharing compliant tool register

as data source

register as data source

register as data source

hosted over agINFRA

hosted over agINFRA

computed over agINFRA

hosted over agINFRA

EGI-APARSEN workshop, Amsterdam, 4-6 March 2014

Page 7: agINFRA EGI-APARSEN workshop, Amsterdam, 4-6 March 2014

7

shares (meta)data e.g. through OAI-

PMH

indexed & available through

CIARD RINGshares (meta)data e.g. through OAI-

PMH

shares (meta)data e.g. through OAI-

PMH

(META)DATAAGGREGATOR

computed over agINFRA

computed over agINFRA

computed over agINFRA

hosted over agINFRA

computed over agINFRA

served through agINFRA

EGI-APARSEN workshop, Amsterdam, 4-6 March 2014

Page 8: agINFRA EGI-APARSEN workshop, Amsterdam, 4-6 March 2014

8

computed over agINFRA

computed over agINFRA

computed over agINFRA

hosted over agINFRA…

EGI-APARSEN workshop, Amsterdam, 4-6 March 2014

Page 9: agINFRA EGI-APARSEN workshop, Amsterdam, 4-6 March 2014

9

Registry of Datasets and APIs

Productivity Tools

Registry of vocabularies

and tools

LOD Vocabularies

agINFRA RDFvocabularies

agINFRA LOD KOSs

data sources

collections

APIs

Information services

Grid

jobs

Grid

wor

kflow

ss

Publ

ic R

EST

APIs

Cloud / SaaS tools

Actors over the infrastructure

EGI-APARSEN workshop, Amsterdam, 4-6 March 2014

Page 10: agINFRA EGI-APARSEN workshop, Amsterdam, 4-6 March 2014

10

Data providers

Information systems

providers

Researchers

Registry of Datasets and APIs

Productivity Tools

Registry of vocabularies

and tools

LOD Vocabularies

agINFRA RDFvocabularies

agINFRA LOD KOSs

data sources

collections

APIs

Information services

Grid

jobs

Grid

wor

kflow

ss

Publ

ic R

EST

APIs

Cloud / SaaS tools

Policy makers

Developers

Actors over the infrastructure

Taxonomists

EGI-APARSEN workshop, Amsterdam, 4-6 March 2014

Page 11: agINFRA EGI-APARSEN workshop, Amsterdam, 4-6 March 2014

11

• a global community movement to make agricultural research information and knowledge publicly accessible to all– http://www.ciard.net

agINFRA 2nd Review Meeting, 13th of December 2013

An existing data community

Page 12: agINFRA EGI-APARSEN workshop, Amsterdam, 4-6 March 2014

12

• CIARD RING (Routemap to Information Nodes and Gateways)– global registry to give access to any kind of

information sources pertaining to agricultural research for development

– principal tool created through CIARD to allow information providers to register their services in various categories and facilitate discovery of sources of agriculture-related information across the world

agINFRA 2nd Review Meeting, 13th of December 2013

A core registry service

Page 13: agINFRA EGI-APARSEN workshop, Amsterdam, 4-6 March 2014

13

New agINFRA RING

agINFRA 2nd Review Meeting, 13th of December 2013

Page 14: agINFRA EGI-APARSEN workshop, Amsterdam, 4-6 March 2014

14

New agINFRA RING

agINFRA 2nd Review Meeting, 13th of December 2013

Page 15: agINFRA EGI-APARSEN workshop, Amsterdam, 4-6 March 2014

15

• data aggregators registering their data providers to CIARD RING– asking directly to

be registered there (AGRIS)

– federating own smaller registries (GLN)

agINFRA 2nd Review Meeting, 13th of December 2013

RING data registry usage scenario 1

Page 16: agINFRA EGI-APARSEN workshop, Amsterdam, 4-6 March 2014

16

• new data providers using agINFRA cloud tools can be automatically registered to CIARD RING– cloud-hosted AgriDrupal or AgriOceanDSpace

instances for document repositories– cloud-hosted agLR instances for learning

repositories• agINFRA Cloud hosting services

– In collaboration with other cloud communities (eg. OKEANOS/GRNET)

– In collaboration with CHAIN-REDS project etc.

agINFRA 2nd Review Meeting, 13th of December 2013

RING data registry usage scenario 2

Page 17: agINFRA EGI-APARSEN workshop, Amsterdam, 4-6 March 2014

17

Registry of Datasets and APIs

Productivity Tools

Registry of vocabularies

and tools

LOD Vocabularies

agINFRA RDFvocabularies

agINFRA LOD KOSs

data sources

collections

APIs

Information services

Grid

jobs

Grid

wor

kflow

ss

Publ

ic R

EST

APIs

Cloud / SaaS tools

Use a cloud hosted CMS

Data provider in need of hosting & storage of small-scale CMS

sets up own CMS instance

agINFRA 2nd Review Meeting, 13th of December 2013

Data provider scenario 1

Page 18: agINFRA EGI-APARSEN workshop, Amsterdam, 4-6 March 2014

18

Registry of Datasets and APIs

Productivity Tools

Registry of vocabularies

and tools

LOD Vocabularies

agINFRA RDFvocabularies

agINFRA LOD KOSs

data sources

collections

APIs

Information services

Grid

jobs

Grid

wor

kflow

ss

Publ

ic R

EST

APIs

Cloud / SaaS tools

Requests space/accounts in large-scale CMS

Data provider in need of large scale hosting & replication CMS

agINFRA 2nd Review Meeting, 13th of December 2013

Data provider scenario 2

Page 19: agINFRA EGI-APARSEN workshop, Amsterdam, 4-6 March 2014

19

• to help all data providers declaring, publishing & linking their metadata properties and value spaces – Publishing their KOSs using the VocBench and their

metadata vocabularies using Neologism– Linking them to existing vocabularies, e.g. AGROVOC for

KOSs, Dublin Core for metadata• guidelines & tools to support data providers in

adopting such a LOD framework– e.g. LODE-BD recommendations

• to provide an entry point to existing relevant vocabularies

agINFRA 2nd Review Meeting, 13th of December 2013

A semantic backbone for agINFRA

Page 20: agINFRA EGI-APARSEN workshop, Amsterdam, 4-6 March 2014

20

Registry of Datasets and APIs

Productivity Tools

Registry of vocabularies

and tools

LOD Vocabularies

agINFRA RDFvocabularies

agINFRA LOD KOSs

data sources

collections

APIs

Information services

Grid

jobs

Grid

wor

kflow

ss

Publ

ic R

EST

APIs

Cloud / SaaS tools

Interested to expose (meta)data to e-infrastructure

Data provider hosting CMS at own or external/commercial infrastructure

agINFRA 2nd Review Meeting, 13th of December 2013

Exposing to the e-infrastructure scenario

Page 21: agINFRA EGI-APARSEN workshop, Amsterdam, 4-6 March 2014

21agINFRA 2nd Review Meeting, 13th of December 2013

agINFRA LOD layer usage scenario 1• A data owner wants to share their data as Linked

Data• The data owner uses non-LOD vocabularies and

KOSs and wants to publish them as LOD and link them to existing vocabularies

• agINFRA offers tools for publishing vocabularies and KOSs

Once the vocabularies are published, all metadata and all concepts have URIs and can be referenced by any other system

Page 22: agINFRA EGI-APARSEN workshop, Amsterdam, 4-6 March 2014

22agINFRA 2nd Review Meeting, 13th of December 2013

agINFRA LOD layer usage scenario 2• Once KOSs are published, all metadata and all

concepts have URIs and can be referenced by any other system

• Data aggregators like AGRIS and GLN can create mash ups between their core data and other agricultural data types (e.g. germplasm, soil maps, statistics, ….) by using the LOD semantic backbone as a crosswalk between metadata formalizations and concepts in different vocabularies

Page 23: agINFRA EGI-APARSEN workshop, Amsterdam, 4-6 March 2014

agINFRA 2nd Review Meeting, 13th of December 2013 23

agINFRA LOD layer usage scenario 2

AGRIS bibliographic metadata

Topic

Thematic metadata

Geographic metadata

Scientificnames

Journal

AGRIS Journals RDF store

DBpediaFAO Country Profiles FAO

FisheriesWorldBank indicators by country

Info on journal

Info on topic

Info on country

Specific indicators on

country

Info on species

Example: LOD-based mash-ups in AGRIS

Page 24: agINFRA EGI-APARSEN workshop, Amsterdam, 4-6 March 2014

Ariadne harvester

Filtering component

Stores

File system (DC, IEEE LOM, MODS XML)

File system (DC, IEEE LOM, MODS XML)

Stores

Identification and de-duplication component

MySQL

Duplicates

Stores

Transformation component

Store metadata in JSON

Link checking component

PostProcessing/Enrichment component

File system (XMLs)

Get unique ID

Records with Broken Links

To be ported on the Grid

Workflow architecture

Page 25: agINFRA EGI-APARSEN workshop, Amsterdam, 4-6 March 2014

Thank you!

Questions