Top Banner
H. Widmann (M&D) Data Discovery and Processing within C3Grid GO-ESSP/LLNL / June, 19 th 2006 / 1 Data Discovery and Basic Processing within the German Collaborative Climate Community Data and Processing Grid (C3Grid) Project Heinrich Widmann and Stephan Kindermann Model and Data / DKRZ / Max-Planck-Institute for Meteorology Hamburg, Germany GO-ESSP at LLNL Livermore, June 19th – 21st, 2006 C3Grid Home: www.c3grid.de
15

Heinrich Widmann and Stephan Kindermann

Jan 07, 2016

Download

Documents

jebeli moeen

Data Discovery and Basic Processing within the German Collaborative Climate Community Data and Processing Grid (C3Grid) Project. Heinrich Widmann and Stephan Kindermann Model and Data / DKRZ / Max-Planck-Institute for Meteorology Hamburg, Germany. GO-ESSP at LLNL - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Heinrich Widmann and Stephan Kindermann

H. Widmann (M&D) Data Discovery and Processing within C3Grid GO-ESSP/LLNL / June, 19th 2006 / 1

Data Discovery and Basic Processing within the German

Collaborative Climate Community Data and Processing Grid (C3Grid)

Project

Heinrich Widmann and Stephan KindermannModel and Data / DKRZ / Max-Planck-Institute for Meteorology

Hamburg, Germany

GO-ESSP at LLNLLivermore, June 19th – 21st, 2006

C3Grid Home: www.c3grid.de

Page 2: Heinrich Widmann and Stephan Kindermann

H. Widmann (M&D) Data Discovery and Processing within C3Grid GO-ESSP/LLNL / June, 19th 2006 / 2

Overview

• C3Grid Background• Data Analysis Workflows• C3Grid Architecture and Interfaces• Data Discovery and Metadata in C3-

Grid• Data Information Service with

Lucene• Data Access and Preprocessing• Summary

Page 3: Heinrich Widmann and Stephan Kindermann

H. Widmann (M&D) Data Discovery and Processing within C3Grid GO-ESSP/LLNL / June, 19th 2006 / 3

C3Grid Background

• C3Grid– Status : month 10 of 36 (phase 1)– is the earth system science community grid

within the German D-Grid initiative– D-Grid includes five further community grid

projects (AstroGrid, HEP-Grid, InGrid, MediGrid, TextGrid)– is a community driven grid

Goal is to develop a grid infrastructure appropriate for typical climate analysis workflows

Stepwise introduction and integration

Page 4: Heinrich Widmann and Stephan Kindermann

H. Widmann (M&D) Data Discovery and Processing within C3Grid GO-ESSP/LLNL / June, 19th 2006 / 4

Requirements

• Metadata• Discovery• Data access(+

preprocessing)

• Security• Scheduling• Complex

processing

Grid technologies

ISO19115 / ISO19139 OAI-PMH + Lucenecommunity

webservice

Shibboleth Globus Toolkit 4 WS-GRAM

C3Grid Data Analysis Workflow Requirements

Page 5: Heinrich Widmann and Stephan Kindermann

H. Widmann (M&D) Data Discovery and Processing within C3Grid GO-ESSP/LLNL / June, 19th 2006 / 5

C3Grid Architecture and Interfaces

Data

Discovery

Data Access and

Basic Processing

Page 6: Heinrich Widmann and Stephan Kindermann

H. Widmann (M&D) Data Discovery and Processing within C3Grid GO-ESSP/LLNL / June, 19th 2006 / 6

C3Grid Data Discovery and Data Access

workspaceworkspace

workspace

data

Scheduling Data Management Service

Portal- Discovery

Data Access Web Service

• oids• time/space constraints• processing constraints

Data request

preprocessing

datadata

DB Files

Prop. Xml

Prop. Rel.

World Data Centers (Climate,Mare,RSAT), DWD

PIK,

IFM-Geomar,..

ISO 19115 /19139

Discovery

Use

Web server / OAI provider

OAI harvester

OAI-PMH

C3 Metadata catalog

workspace

resourceprovider

- Workflow composition

WS-GRAM

Grid Infrastructure Metadata

job submission

analysisjob

Page 7: Heinrich Widmann and Stephan Kindermann

H. Widmann (M&D) Data Discovery and Processing within C3Grid GO-ESSP/LLNL / June, 19th 2006 / 7

<MD_Metadata http://www.isotc211.org/xxx">

<fileIdentifier ../>

<resourceConstraints ../>

<extent … spatial+temporal bounding box .. />

<contentInfo ..>

<attributeDescription ../>

<distributionInfo ..>

<DS_Series>

<composed_of>

<composed_of>

</MD_Metadata>

<MD_Metadata …. >

<MD_Metadata …. >

C3 ISO 19139 Metadata “Profile”

Data Items:

• gridded data

MetadataDatabase

“implicit” Metadata

Metadata

Metadata

ArchiveDatabase

PostprocessedExperiment Data• 2D single variabletime series

Post-processing

Raw Experiment Data• 3D multi variablefiles

Page 8: Heinrich Widmann and Stephan Kindermann

H. Widmann (M&D) Data Discovery and Processing within C3Grid GO-ESSP/LLNL / June, 19th 2006 / 8

C3Grid Data Information Service with Lucene

full-text index

harvestingbackend

Web service frontend

Apache Axis+ Servlet Container

Apache Lucene

Portal

CERAPangaeaArchiv

Webserver

OAI-PMH

DIS

<MD_Metadata>...</MD_Metadata><MD_Metadata>...</MD_Metadata><MD_Metadata>...</MD_Metadata><MD_Metadata>...</MD_Metadata>

Field Term Documentidentifier ABC:123 2

identifier XYZ:223 6

identifier MI6:007 12

abstract region 2,23,112abstract pressure 3,23abstract humid 4,33,215,6,4

min_lat 030.43 1min_lat -023.23 2local file://path/ 4

inverted index

cache for ISO19139 documents

indexingof

selectedfields

[T. Langhammber, ZIB, Berlin]

Page 9: Heinrich Widmann and Stephan Kindermann

H. Widmann (M&D) Data Discovery and Processing within C3Grid GO-ESSP/LLNL / June, 19th 2006 / 9

C3Grid Portal – Simple search

Page 10: Heinrich Widmann and Stephan Kindermann

H. Widmann (M&D) Data Discovery and Processing within C3Grid GO-ESSP/LLNL / June, 19th 2006 / 10

C3Grid Portal – Advanced search

Page 11: Heinrich Widmann and Stephan Kindermann

H. Widmann (M&D) Data Discovery and Processing within C3Grid GO-ESSP/LLNL / June, 19th 2006 / 11

C3Grid Data Access and Preprocessing

• Data access interface– Community-specific webservice (WSDL)– Solutions of the individual institutes will

be adapted to support the webservice•e.g. triggering of local data

processing tools – Support data base and file based

storage types– More detailed use metadata will be

provided during the extraction process with the data

Page 12: Heinrich Widmann and Stephan Kindermann

H. Widmann (M&D) Data Discovery and Processing within C3Grid GO-ESSP/LLNL / June, 19th 2006 / 12

C3Grid Data Access/Preprocessing Interface

datadata

DB

Files

DataAccessWeb

service

Access

CDO processing

Stage file webservice request contains :• ObjectList of OIDs requested• CFList of standard names • Space constraints• Time constraints• Target directory• File format, e.g. netCDF or grib• …

SOAP-XMLStageFileRequest

Constraints

necessaryprocessing

CF standardnames

Local variable

names

data

Page 13: Heinrich Widmann and Stephan Kindermann

H. Widmann (M&D) Data Discovery and Processing within C3Grid GO-ESSP/LLNL / June, 19th 2006 / 13

Summary

• Grid development is application driven• Discovery is based on

– ISO 19115/19139 based metadata catalog– Hierarchical, two-leveled metadata

scheme– Text based search in the catalog

• Data access is implemented by• Proprietary C3Grid data access interface

(webservice)

• Part of the use data are provided along with the data extraction

Page 14: Heinrich Widmann and Stephan Kindermann

H. Widmann (M&D) Data Discovery and Processing within C3Grid GO-ESSP/LLNL / June, 19th 2006 / 14

The end

Page 15: Heinrich Widmann and Stephan Kindermann

H. Widmann (M&D) Data Discovery and Processing within C3Grid GO-ESSP/LLNL / June, 19th 2006 / 15

C3Grid Architecture

DBMS/File

AvailableResources

Distributed Processing Resources

Distributed Processing Resources

DistributedData Archives

DistributedData Archives

MetaData

JobData

DMS (local)Site C3Grid Components

OAI / WS

Pre-Proc

Grid Workspace

ResourceScheduler

Base Data &Meta Data

File Management

ArchiveInterface

Data Transfer Service

DistributedGrid Infrastructure

• GT4 based• new Metadata-Service

DMS (global)WorkflowScheduler

ResourceInformation

Service

DIS

Staging

Search

Harvesting Task Execution

Matchmaking

User

Job Submission

User Interface API (Web Services) GUIMonitoring