Parallel Session - Room 211/212 Value and Challenges of Federated … · Docker containers available supporting multiple execution environments (MPI, SharedMemory) and the integration

Post on 28-May-2020

2 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

Transcript

www.eoscpilot.euThe European Open Science Cloud for Research pilot project is funded by the

European Commission, DG Research & Innovation under contract no. 7395631

Chairs:

Hermann Lederer (MPCDF) and Volker Beckmann (CNRS)

Panelists

Carlos Oscar Sorzano (CNB, CSIC, ES) - CryoEM

Werner Kutsch (ICOS RI) – ENVRI/ERFI

Andreas Rietbrock (U. Liverpool) - EPOS/VERCE

Erik van den Bergh (EMBL) - EGA Life Sciences Datasets

Rob van der Meer (Astron) - LOFAR data

Parallel Session - Room 211/212

Value and Challenges of Federated Open Science

Carlos Oscar S. Sorzano

Instruct Image Processing Center

http://i2pc.es

Science Demonstrator:Cryo-Electron Microscopy

Federated Open Science Challenges

Solution: Reproducible JSON[

{

"object.className": "ProtImportMicrographs",

"object.id": “1",

"filesPath": “/data/movie_?????.mrc"

},

{

"object.className": "ProtPreprocessMicrographs",

"object.id": “2",

“inputMicrographs": “1.outputMicrographs",

“doDownsample”: True,

“downsamplingFactor”: 2

},

{

"object.className": "ProtEstimateCTF",

"object.id": “3",

“inputMicrographs": “2.outputMicrographs",

“minDefocus”: 0.5,

“maxDefocus”: 4

},

]

Challenges ahead

Integration challenges

Workflow description:

• Common Workflow Language

• Business Process Execution Language (BPEL)

• Yet Another Workflow Language (YAWL)

• Apache Taverna

• Galaxy

• Knime

• …

Ontology support:

• EDAM ontology

• Ontology coverage

Open science challenges

• Driving force to report process

• Driving force for programs to migrate

Data description:

• Minimum Information (MIBBI, MIAME,

MIAPE,

Challenges ahead

Science Demonstrator ENVRI/ERFI

www.eoscpilot.eu7

DEMONSTRATOR:

Focus on dynamics of greenhouse gases, aerosols and clouds and their role in radiative

forcing, Interoperability between observations and climate modeling; cooperation between

environmental research infrastructures.

Improvement of data integration services based on metadata ontologies, model-data

integration by use of HPC, Petascale data movement, innovative services to compile and

compare model output from different sources, especially on semi-automatic spatiotemporal

scale conversion

FAIR CHALLENGES:

Findability: Metadata ontologies matching between NETCDF-CF and in-situ metadata, data

quality indicators.

Accessibility: Automated access routines between the RI repositories. For fully open data, this

is not immediately problematic, but might require analysis on needed resources and APIs.

Interoperability: APIs, service integration, large data transfers, where to do processing (how to

document?)

Reusability: Citing and persistently identifying scale-changed data-sets? How to transfer

knowledge of data versions used.

ENVRI Radiative Forcing IntegrationOrganisations & Contacts: Werner Kutsch, Alex Vermeulen (ICOS ERIC), Ari Asmi (ENVRIplus) Paolo Laj(ACTRIS), Stefan Kindermann, IS-ENES2 (DKRZ), Sylvie Joussaume, Sébastien Denvil, IS-ENES2 (IPSL)

Computational Seismology

3D waveform modelling in 3D media

The VERCE platform

Andreas Rietbrock & Federica MagnoniAlessandro Spinuso, Andre Gemud, Rafiq Saleh, Emanuele Casarotti

EPOS Computational Earth Sciences

Towards Inverse Modeling

HPC Server Cloud / EOSC

Towards inverse modeling: Misfit calculation

Misfit AnalysisData/Synt Processing Simulated Synthetics

Data Download (FDSN)

Provenance Validation and Monitoring

Generates W3C-PROV

Agile Data Intensive Framework

Python library used to describe abstract workflows for

distributed data-intensive applications.

Support for composition: Single components may be defined by

having their own internal workflows.

Workflows described in dispel4Py can be automatically executed

in numerous parallel environment.

Docker containers available supporting multiple execution environments (MPI,

SharedMemory) and the integration with other workflow systems (eg. Pegasus)

MPI

Deployed on local Clouds (MAP-REDUCE streaming model)

• dispel4py.org

• solution is not supported in OCCI

• by OCCI

Supported by EOSC Demonstrator and new EU project DARE

EGA Life science datasets

A third part dataset (GoNLproject) as use case

Reproduction of the original pipeline

Production of an updated pipeline

Containerized versions of both pipelines using NextFlow

Test both pipelines on the use case dataset

EOSC pilot LOFAR SDEOSCpilot Stakeholder meeting

28 November 2017, Brussels

Rob van der Meer, ASTRON

14

ASTRON is part of the Netherlands Organisation for Scientific Research (NWO)

15

EOSC Pilot LOFAR SD

Excellent Science

• Reduce and analyse Radio astronomy data:• From antenna signal to visibilities to images

Main Challenges

• Large volume, complex (multi-step analysis)• Power users need compute at their data because of volume

• Unexperienced user need guidance with parameters anddata sets

• Make it work across platforms and data centers

16

EOSC Pilot LOFAR SD

Approaches to solution• implementation of Common Workflow Language (CWL)

based pipelines• first results on ‘prefactor pipeline’

The Prefactor pipeline in CWL

17

18

Approaches to solution

• implementation of Common Workflow Language (CWL) based pipelines

• first results on ‘prefactor pipeline’

• make them deployable as Singularity containers.

• run on various systems (data centers)

• Pilot project on SURFsara HPC cloud

• Pilot on beta phase of HTP cluster in February 2018

EOSC Pilot LOFAR SD

top related