Top Banner

First online hangout SC5 - Big Data Europe first pilot-presentation-hangout

Apr 15, 2017


Data & Analytics

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Page 1: First online hangout SC5 - Big Data Europe  first pilot-presentation-hangout


NCSR “DEMOKRITOS”, GREECE1st SC5 hangout, January 12, 2016

Page 2: First online hangout SC5 - Big Data Europe  first pilot-presentation-hangout

Climate 1st Pilot case: Supporting data-intensive climate research

Pilot under development by NCSR “Demokritos” Impact on / basis for synergies with:

o SCs that need feedback on the potential  effects of climate change for considering adaptation, mitigation and prevention measuresSC1 –HealthSC2 -Food and AgricultureSC3 -Secure, clean and efficient EnergySC4 -Smart, green and integrated TransportSC6 -Europe in a changing world - inclusive, innovative and reflective societiesSC7 -Secure societies - protecting freedom and security of Europe and its citizens

Page 3: First online hangout SC5 - Big Data Europe  first pilot-presentation-hangout

Background & Problem Statement

The extended scientific effort over the last few decades in understanding and studying the earth’s climate and climate change has resulted in a huge volume production of model (e.g. Global Circulation Models, Regional Models) and observational (e.g. satellite, aircraft, station) data

The provision of such model data satisfies an important objective, that of assessing the potential impacts of climate change on well being for adaptation, prevention and mitigation measures and supporting other policy making decisions

The climate research and impact assessment communities need to interface with useful data resources to satisfy the requirement of extracting data efficiently and timely. One of the methodologies applied is that of dynamical downscaling to obtain values of physical atmospheric variables in smaller spatial and temporal scales.

The present BDE pilot aims to facilitate the process of dynamical downscaling from global climate data to regional / local scales with the support of tools aggregated on the Big Data Europe platform

Page 4: First online hangout SC5 - Big Data Europe  first pilot-presentation-hangout

Dynamical downscaling

Output of Global Circulation Models (GCMs) run at coarse spatial resolution drives Regional or Local Models run at higher resolution

Aim: To obtain local weather and climate variables needed for impact assessment and planning by decision makers

Assumption: local climate depends on large-scale atmospheric characteristics and local-scale features

Page 5: First online hangout SC5 - Big Data Europe  first pilot-presentation-hangout

Purpose of the Pilot

Provide an intuitive interface between researchers and specific climate data portals and providers.

Search and download climate model and, optionally, observational data, according to user requirements, such as geographic coverage and / or experiments (scenarios).

Setup and orchestrate the execution of the dynamic downscaling process on institutional computational resources, while gathering and managing data products.

Establish a workflow for useful metadata mappings and data lineage.

Page 6: First online hangout SC5 - Big Data Europe  first pilot-presentation-hangout


The user accesses the climate data portal and defines the geographic area and scenarios.

The user then selects and downloads data on the BDE platform. The user defines the case-specific parameters for the dynamical downscaling process

o several of the above options will be predefined for the present pilot The platform initiates and controls the dynamical downscaling process by invoking the

modules that will perform the preprocessing and main computation steps.o on institutional resources provided and used by the partnero data mapping and necessary transformations will be addressed at the pre-processing stage,

to ensure compatibility of the data with the modelling software The intermediate and final data products are stored on the BDE platform, while data

lineage is also tracked. The process described can be repeated either from any intermediate step or from the

beginning, using data from different climatic models

Page 7: First online hangout SC5 - Big Data Europe  first pilot-presentation-hangout

Current practice

no generally agreed workflow to perform the dynamical downscaling process data acquisition, handling and management is often performed in an ad-hoc manner.

o observational and analysis data from external services are transferred to local filesystems Researchers then preprocess the data using third-party or custom-made scripts. The preprocessed data (i.e. the resulting set of files) are then fed into the regional

model of choice along with a set of suitable parameters. The regional model is usually run on local computational infrastructure The resulting files are used for visualisation and further analysis either on local or,

data-size and task-permitting, on the researchers’ machines. Difficulties can be encountered in managing and archiving the data produced at the

various stages. Difficult to track progress and intermediate data products as well as to encourage

reuse and collaboration across disciplines

Page 8: First online hangout SC5 - Big Data Europe  first pilot-presentation-hangout

Primary content / data involved

This Pilot will make extensive use of publically available climate data Earth System Grid Federation (ESGF)

o international collaboration for the management, dissemination and analysis of model output and observational data

o complete set of CMIP5 data (global climate model simulations)o most CORDEX data (regional climate model simulations)o more than 15 portals and more than 45 data nodes worldwideo IS-ENES operates the European part of ESGF (

European Centre for Medium range Weather Forecasting (ECMWF)o intergovernmental organisation supported by 21 European Member States and 13

cooperating Stateso ECMWF’s public data Certain subsets of ECMWF data are also published via ESGF

Page 9: First online hangout SC5 - Big Data Europe  first pilot-presentation-hangout

Data structures

This pilot use case will make use of NetCDF data setso binary, header-based formato variable data are stored in multidimensional arrays

Access to relevant datasets is described via THREDDS records on each node’s catalogue

The data are described using the Climate Forecasting (CF) conventions

The implementation of this pilot will make use of WRF - the Weather Research and Forecasting model - to perform the dynamical downscalingo the internal metadata will be expressed in a WRF-compatible format. Appropriate

mappings will be put in place so that the conversion is transparent to the user and that it takes place automatically upon ingestion.

Page 10: First online hangout SC5 - Big Data Europe  first pilot-presentation-hangout

Current technological infrastructure at the Pilot partner

The pilot partner - NCSR Demokritos - currently operates WRF on a daily basis to produce weather forecasts on 3 nested computational grids covering Europe, Greece and Attica peninsula

Initial and boundary conditions are provided by GFS data (from US). The GFS data are downloaded automatically and pre-processed to produce the necessary input file for running WRF.

Page 11: First online hangout SC5 - Big Data Europe  first pilot-presentation-hangout

Functional roles involved in this Use Case

Research site/centre (stakeholder): use of the functionality of this pilot either as an external service/product or by deploying an internal instance of the BDE platformo Benefit: increase the potential for its internal processes with minimal disruption and

expansion requirements to its infrastructure and internal policies. Researcher (primary user): effectively search for data, potentially across

different providers, perform processing tasks on the BDE platform or on their departmental resourceso gain data provenance informationo increase the efficiency of experimental runs, o increase the value of their experiments (as they will be archivable and retrievable for

potential future reference, publication and scientific replication)o increase the potential for primary users to perform multiple downscaling computations

and inter-compare their results

Page 12: First online hangout SC5 - Big Data Europe  first pilot-presentation-hangout

IT infrastructure

Potential configurations for a research institute to use the Pilot:o As an external set of services: through agreement with the instance provider

provides the pilot services are accessible and integratable to current research procedures.

o As an internal infrastructure component: the research institute provides the necessary hardware to be used for hosting the BDE instance.

In showcasing the pilot we aim to follow the first approach, taking advantage of the common administration within NCSR-D.

Page 13: First online hangout SC5 - Big Data Europe  first pilot-presentation-hangout


This pilot aims to aid climate research, and therefore encourage consequent impactful synergies, by providingo a user-friendly, interactive, way to query and fetch data from resources such as ESGF and

ECMWFo a framework to support the incremental, data-oriented carrying out of climate-related

experiments, while storing intermediate data products and associated data lineage. In particular this pilot aims at:

o improving the productivity of researchersEasier management, ingestion and transformation of external dataoverseeing the execution of model runs over the data, making use of existing infrastructure and

procedures Improve efficiency and reusability in downscaling computational experiments,

o Create opportunities for pilots across communities within the BDE platformClimate change impact assessment studies on sectors such as energy, food and agriculture are

potential future pilot-use cases across societal challenges in the BDE platform.

Page 14: First online hangout SC5 - Big Data Europe  first pilot-presentation-hangout

Climate: Pilot Architecture

Page 15: First online hangout SC5 - Big Data Europe  first pilot-presentation-hangout

Thank you for your attention
