Top Banner
Bryan Lawrence on behalf of the NDG, BADC and BODC. Ray Cramer, Marta Gutierrez, Kerstin Kleese, Siva Kondapalli, Sue Latham, Roy Lowry, Kevin O’Neill, Ag Stephens, Andrew Woolf NERC DataGrid Status: ESP June 2004 British Atmospheric Data Centre http://badc.nerc.ac.uk
42

NERC DataGrid Status: ESP June 2004

Jan 12, 2016

Download

Documents

leif

NERC DataGrid Status: ESP June 2004. Bryan Lawrence on behalf of the NDG, BADC and BODC. Ray Cramer, Marta Gutierrez, Kerstin Kleese, Siva Kondapalli, Sue Latham, Roy Lowry, Kevin O’Neill, Ag Stephens, Andrew Woolf. British Atmospheric Data Centre http://badc.nerc.ac.uk. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: NERC DataGrid Status: ESP June 2004

Bryan Lawrence on behalf of the NDG, BADC and BODC.

Ray Cramer, Marta Gutierrez, Kerstin Kleese, Siva Kondapalli, Sue Latham, Roy Lowry, Kevin O’Neill, Ag Stephens, Andrew Woolf

NERC DataGrid Status: ESP June 2004

NERC DataGrid Status: ESP June 2004

British Atmospheric Data Centre

http://badc.nerc.ac.uk

Page 2: NERC DataGrid Status: ESP June 2004

British Atmospheric Data Centrehttp://badc.nerc.ac.uk

Outline

• NDG Aims and Metadata Taxonomy (Review )

• Demonstration of NDG in action (no grid services yet, but shape of things to come

should be clear)– “Stub-B”– New Tool: DataExtractor

• Status– Issues with metadata

• Chemistry data at BADC• Numerical Simulation Discovery• & back to Status

Page 3: NERC DataGrid Status: ESP June 2004

British Atmospheric Data Centrehttp://badc.nerc.ac.uk

http://ndg.nerc.ac.uk

British Atmospheric Data Centre

British Oceanographic Data Centre

Simulations

Assimilation

Complexity + Volume + Remote Access = Grid Challenge

Page 4: NERC DataGrid Status: ESP June 2004

British Atmospheric Data Centrehttp://badc.nerc.ac.uk

NDG Metadata Taxonomy

Page 5: NERC DataGrid Status: ESP June 2004

British Atmospheric Data Centrehttp://badc.nerc.ac.uk

Page 6: NERC DataGrid Status: ESP June 2004

British Atmospheric Data Centrehttp://badc.nerc.ac.uk

NDG Metadata Architecture

Service based model:• clear separation between discovery and use• discovery service standards compliant and interoperable

Page 7: NERC DataGrid Status: ESP June 2004

British Atmospheric Data Centrehttp://badc.nerc.ac.uk

(D) - Discovery

OAI

OAI

Open Archives Initiative – Digital Library Protocol for harvesting metadata.

NDG Supports Multiple Discovery Services – “build your own”

DirectoryInterchange

Format

DublinCore

GEOProfile

(Z39.50)

IntermediateSchema

Document(s)(XML)

XSLTProcessor

XSLTProcessor

XSLTProcessor

ISO 19115?

CatalogueInteroperabiltiy

Protocol ?

NDG DiscoveryServiceElement

XSLT IngestTransformation

ExistingMetadata

Multiple Protocol Support will be built into the “NDG Vanilla Discovery Service”

Page 8: NERC DataGrid Status: ESP June 2004

British Atmospheric Data Centrehttp://badc.nerc.ac.uk

NDG Structure

XML

QQuery

Destination

NDGPortal

QueryType

One ormoreLocal

NDG DB

XML

D

Browse and redefine query

Discovery

QueryType

Data

Note that definitions A do not need tomatch any ingested A

Documents and Annotations

Detailed

User/SoftwareGenerates Query

XML

CDeliver one or moredocuments to user

XML

B

LocalNDG DBexists?

IngestA

Y

ExtractData

PhysicalData

Deliver Data

NDG Query and Data Delivery

Define DataRequest, Q

XML

A

Page 9: NERC DataGrid Status: ESP June 2004

British Atmospheric Data Centrehttp://badc.nerc.ac.uk

Wider InternetNERC Grid

taperobot

XML data-base

XML data-base

BADC NDG Wrapper

OnlineData

OnlineData

BODC NDGWrapper

OnlineData

XML data-base

Group NDGWrapper

Software Agent

Grid User

Satellite Supercomputer

Research Group DataSources

Internet Link

Internet User

Internet LinkESG (&other)Applications

Wider Internet

NDGWeb

Portal

XML data-base

Page 10: NERC DataGrid Status: ESP June 2004

British Atmospheric Data Centrehttp://badc.nerc.ac.uk

Page 11: NERC DataGrid Status: ESP June 2004

British Atmospheric Data Centrehttp://badc.nerc.ac.uk

Can order responses by title or data centre (or default random)Choose to return

either data or “B-”Metadata

Look at DIFs in either HTML or XML

Flexible Information Return

Page 12: NERC DataGrid Status: ESP June 2004

British Atmospheric Data Centrehttp://badc.nerc.ac.uk

Page 13: NERC DataGrid Status: ESP June 2004

British Atmospheric Data Centrehttp://badc.nerc.ac.uk

Page 14: NERC DataGrid Status: ESP June 2004

British Atmospheric Data Centrehttp://badc.nerc.ac.uk

Page 15: NERC DataGrid Status: ESP June 2004

British Atmospheric Data Centrehttp://badc.nerc.ac.uk

Page 16: NERC DataGrid Status: ESP June 2004

British Atmospheric Data Centrehttp://badc.nerc.ac.uk

Page 17: NERC DataGrid Status: ESP June 2004

British Atmospheric Data Centrehttp://badc.nerc.ac.uk

Page 18: NERC DataGrid Status: ESP June 2004

British Atmospheric Data Centrehttp://badc.nerc.ac.uk

Page 19: NERC DataGrid Status: ESP June 2004

British Atmospheric Data Centrehttp://badc.nerc.ac.uk

Page 20: NERC DataGrid Status: ESP June 2004

British Atmospheric Data Centrehttp://badc.nerc.ac.uk

Role of B metadata: domain ontology

• B metadata is a store of metadata intended to:– Allow the production of the various “industry

standard” discovery formats (DIF, DC, FGDC/GEO, 19115)

– Provide a more complete metadata store than that demanded by the usual discovery formats, leveraging the metadata holdings of the data centres

– Allow a smooth link across to the data browse and use elements of the NDG

• Expected to expand in importance as we can add more semantic detail to the schema

Page 21: NERC DataGrid Status: ESP June 2004

British Atmospheric Data Centrehttp://badc.nerc.ac.uk

B metadata – a simplified view

IncludesIncluded-in

Activity e.g.ProjectProgrammeExperiment

IncludesIncluded-in

IncludesIncluded-in

IncludesIncluded-in

Is-an

Observation stationStationary e.g.

Land stationMooring

Moving e.g.ShipAircraft

Common entitiesOrganisationPeopleRolesSpatio-temporal e.g.

AreaPlaceGridPointTrajectoryDateTime

Data EntitiesBasic Data Entities e.g.

Lagrangian pathEnsembleSampleProfileSectionn-dimensional

datasetDerived Data Entities e.g.

ClimatologyTimeseriesIntegration

Dataset typesSimulationAnalysisMeasurement

Data production toolsModels: e.g.

UMInstrument e.g.

CTDSondeLidar

IncludesIncluded-in

Deploys-aDeployed-on-a

ProducesOutput-by

Page 22: NERC DataGrid Status: ESP June 2004

British Atmospheric Data Centrehttp://badc.nerc.ac.uk

How is the B metadata implemented?

Core linking concept is the deployment

Deployment

Activity

on behalf of an Activity

of a Data Production Tool at an Observation Station

that produces a Data Entity

DataProduction

Tool

ObservationStation

Data Entity

Each of the main metadata objects has security data attached to it. This means that this can be applied to queries on the metadata

Links the metadata records into a structure that can be turned into a navigable XML using Xquery or XSLT with any of the record types as the root element.

Page 23: NERC DataGrid Status: ESP June 2004

British Atmospheric Data Centrehttp://badc.nerc.ac.uk

“Stub B” – what is it?

“B” metadata works well in databases, but what about:– presentation– “standalone generation of “D”– storing metadata locally as files

Given a raw B record for a Data Entity contains just:the basic data entity details a series of references to related records

no details such as:– activity name,– instrument name,– station

“stub B” is the base entity expanded through its own related deployments and internal references

Page 24: NERC DataGrid Status: ESP June 2004

British Atmospheric Data Centrehttp://badc.nerc.ac.uk

Role of Stub B

• Makes application developers’ lives easier, especially in the presentation of search results

• Allow off-line storage of metadata by users• Basis of D production via XSLT• Hook into main B repositories• Potential discovery format (while there are lots

around already… this could allow more “discipline dependent” discovery)

Page 25: NERC DataGrid Status: ESP June 2004

British Atmospheric Data Centrehttp://badc.nerc.ac.uk

Discovery Metadata Usage

Local Discoverymetadata store

(DIF)

Discoverymetadata (D)store (DIFs)

Query interface

OA

I Harvesting

Local “B”metadata store

Local Discoverymetadata store

(DIF)

Local “B”metadata store

Query

Results

Query local B metadatareceive “stub B” reply

Query local B metadatareceive “stub B” reply

Data Browseand use (“A”metadata)

Localdisc

Store queries/results to local storage

Investigate data in detail

DataCentre 1

DataCentre 2

Generate

Generate

Page 26: NERC DataGrid Status: ESP June 2004

British Atmospheric Data Centrehttp://badc.nerc.ac.uk

Page 27: NERC DataGrid Status: ESP June 2004

British Atmospheric Data Centrehttp://badc.nerc.ac.uk

Page 28: NERC DataGrid Status: ESP June 2004

British Atmospheric Data Centrehttp://badc.nerc.ac.uk

Page 29: NERC DataGrid Status: ESP June 2004

British Atmospheric Data Centrehttp://badc.nerc.ac.uk

Page 30: NERC DataGrid Status: ESP June 2004

British Atmospheric Data Centrehttp://badc.nerc.ac.uk

Page 31: NERC DataGrid Status: ESP June 2004

British Atmospheric Data Centrehttp://badc.nerc.ac.uk

Background activity being parallelised with GODIVA/CCLRC e-science collaboration (spectral -> gridpoint + CDMS + visualisation tools)

Download either plot or the data that went into the plot.

Page 32: NERC DataGrid Status: ESP June 2004

British Atmospheric Data Centrehttp://badc.nerc.ac.uk

Where are we?• Major effort on defining feature types for observation types so we can build an

OGC/ISO compatible data extractor for observations and numerical data.• Main thrust for Andrew Woolf and 0.5 New FTE• Ag Stephens contributing when time available

• Security Infrastructure Development– Collaboration with CCLRC e-science, ECOGrid and 0.5 FTE

• Ongoing work on metadata definition and population:– Oceanographic data

• Siva Kondapalli– Chemistry data

• Main thrust for Sue Latham– Numerical Modelling data

• DIF numerical definition (moving to ISO), BADC and UK Community• Katherine Bouton’s work at NCAS/CGAM

– Remote Sensing Data• Collaboration with NEODC and PML

• Ongoing work on databases and interfaces, DIF to ISO and “B”• Kevin O’Neill and Marta Gutierrez

Page 33: NERC DataGrid Status: ESP June 2004

British Atmospheric Data Centrehttp://badc.nerc.ac.uk

Authorisation• Role-based access:

<dataset> <host> badc.nerc.ac.uk </host><name>ukmo-obs </name><access-requires> researcher <access-requires><access-requires> ukmo-obs </access-requires><processing-requires> nerc </processing-requires></dataset>

• Key concept: Only hosts that trust each other share data, even within a larger virtual organisation: e.g. at BADC:

<trusted><bodc><host>ndg.bodc.nerc.ac.uk</host><attribute remotename=”nerc”> nerc </attribute><attribute remotename=”ashoe”> ashoe </attribute><attribute remotename=”staff”> nerc </attribute> <other> bodc </other></bodc></trusted>

Signed “conditions of use” form exists for this dataset

Page 34: NERC DataGrid Status: ESP June 2004

British Atmospheric Data Centrehttp://badc.nerc.ac.uk

NDG Security

Certificate based, pass encrypted credentials between user and gatekeeper.

Page 35: NERC DataGrid Status: ESP June 2004

British Atmospheric Data Centrehttp://badc.nerc.ac.uk

Extending the CF convention for chemistry …

grep -i sulphate vars2.csv"Allen, Andrew and Grenfell, Lee ", Sulphate / coarse (ug/m3)"Allen, Andrew and Grenfell, Lee ", Sulphate / fine (ug/m3)"Bradbury, Carl ", SULPHATE LOADING (ug/m3)"James, Jonathan And Allen, Andrew ", Sulphate / coarse (ug/m3)"James, Jonathan And Allen, Andrew ", Sulphate / fine (ug/m3)"James, Jonathan And Allen, Andrew ", Sulphate / fine+coarse (ug/m3)"James, Jonathan And Allen, Andrew ", Sulphate / fine+coarse (ug/m3)"McArdle, Nicola and Thompson, Adrian ", sulphate (nmol m-3)"McArdle, Nicola and Thompson, Adrian ", sulphate (µM)"McArdle, Nicola and Thompson, Adrian ", sulphate <1.1 µm diameter (nmol m-3)"McArdle, Nicola and Thompson, Adrian ", sulphate <1.2 µm diameter (nmol m-3)"McArdle, Nicola and Thompson, Adrian ", sulphate <1µm diameter (nmol m-3)"McArdle, Nicola and Thompson, Adrian ", sulphate > 1µm diameter (nmol-3)"McArdle, Nicola and Thompson, Adrian ", sulphate >1.1 µm diameter (nmol m-3)"McArdle, Nicola and Thompson, Adrian ", sulphate >1.2 µm diameter (nmol m-3)"McArdle, Nicola and Thompson, Adrian ", sulphate bulk (nmol m-3)"McArdle, Nicola and Thompson, Adrian ", sulphate bulk (nmol m-3)"McFadyen, Gordon ", Sulphate"Robertson, Leonie and Davison, Brian ", Coarse sulphate concentration (ug m-3)"Robertson, Leonie and Davison, Brian ", Fine sulphate concentration (ug m-3)

•Currently 35,000 Ames format files, mostly Atmospheric Chemistry …

• Real problems with vocabulary, and units …

• Spinning up a new project …

• … need community help!

grep -i butane vars2.csv… i-Butane (ppt)… iso-Butane (ppt)… n-Butane (ppt)… iso-Butane pptv … ISO-BUTANE (pptv)… i-,n-butane

ACSOE just one of many datasets with this problem …

Page 36: NERC DataGrid Status: ESP June 2004

British Atmospheric Data Centrehttp://badc.nerc.ac.uk

DRAFT DIF Component (1)

Key New Groups:

Numerical Model

ID Information

Numerical Model Components

(from) Atmosphere, Ocean-Dynamic, Ocean-Thermodynamic, Cryosphere, Land-Surface with possible appends: Chemistry, 4D-VAR, 3D-VAR, QG

… details for each

Numerical Simulation

ID Information

Initial Condition Information

… details

Forcing Information

… details

Page 37: NERC DataGrid Status: ESP June 2004

British Atmospheric Data Centrehttp://badc.nerc.ac.uk

DRAFT DIF Component (2)

*=required

+=repeatable

Group: Numerical_Model

Model_Name:

Model_Version:

* Model_Calendar: [model calendar valid] - eg CF calendar or ISO

*+ Group: Model_Component

* Model_Component_type: [Model Component Valid]

Model_Component_Resolution:

Group: Model_Component_VerticalDomain

VerticalDomain_Top:

VerticalDomain_Bottom:

End_Group

Model_Component_Timestep:

Group: Model_Component_Summary

[Multiple text lines allowed]

End_Group

End_Group

URL:

End_Group

Group: Numerical_Model

Model_Name:HadCM3

Model_Calendar: 360 day

Group: Model_Component

Model_Component_type: Atmosphere

Model_Component_Resolution: 2.5 degrees latitude, 3.75 degrees longitude

Group: Model_Component_VerticalDomain

VerticalDomain_Top: 1000 hPa

VerticalDomain_Bottom: 4 hPa

End_Group

Model_Component_Timestep: 0.5 hours

Group: Model_Component_Summary

Cullen et al Atmospheric Model adapted for climate use.

End_Group

End_Group

Example

Group: Model_Component

Model_Component_type: Ocean

Model_Component_Resolution: 1 degrees latitude, 1 degrees longitude

Group: Model_Component_VerticalDomain

VerticalDomain_Top: 0m

VerticalDomain_Bottom: 6000m

End_Group

Model_Component_Timestep: 0.5 hours

Group: Model_Component_Summary

Bryan and Cox Ocean model

End_Group

End_Group

URL: http://www.metoffice.com/hadcm3 (e.g.)

End_Group

Page 38: NERC DataGrid Status: ESP June 2004

British Atmospheric Data Centrehttp://badc.nerc.ac.uk

Draft DIF Components (3)*=required

+=repeatable

Group: Numerical_Simulation

Numerical_Simulation_Name:

* Numerical_Simulation_ID: [recorded using PURI]

* Group: run_period

* start_date: [yyyy-mm-dd-hh]

* end_date: [yyyy-mm-dd-hh]

* real_date: [yes,no]

End_Group

* Group Initial_Condition:

Ensemble: [Numeric Value]

Summary:

End_Group

*+ Group Forcing:

Ensemble: [Numeric Value]

Ensemble Parent : [uri or 0]

Summary:

End_Group

End_Group

Group: Numerical_Simulation

Numerical_Simulation_Name: All Forcings

Numerical_Simulation_ID: format_tbd_but_using_purl

Group: run_period

start_date: 1859-12-01-00

end_date: 1999-11-30-00

real_date: yes

End_Group

Group Initial_Condition:

Ensemble: 4

Ensemble Parent: 0

Summary: initial conditions taken from the HadCM3 control integration

End_Group

Group Forcing:

Summary: Volcanic forcing from Sato et al

End_Group

Group Forcing:

Summary: Solar Forcing from Lean et al

End_Group

Group Forcing:

Summary: CO2 from...

End_Group

Group Forcing:

Summary: Anthropogenic SO2 from...

End_Group

End_Group

ExampleGroup: Numerical_Simulation

Numerical_Simulation_Name: Ensemble Member of blah

Group: run_period

start_date: 1859-12-01-00

end_date: 1999-11-30-00

real_date: yes

End_Group

Group Initial_Condition:

Ensemble: 1

Ensemble Parent: format_tbd_but_using_purl

Summary: initial conditions taken from the HadCM3 control integration

End_Group

Group Forcing:

Summary: Volcanic forcing from Sato et al

End_Group

Group Forcing:

Summary: Solar Forcing from Lean et al

End_Group

Group Forcing:

Summary: CO2 from...

End_Group

Group Forcing:

Summary: Anthropogenic SO2 from...

End_Group

End_Group

Example Ensemble Member

Page 39: NERC DataGrid Status: ESP June 2004

British Atmospheric Data Centrehttp://badc.nerc.ac.uk

Where are we?• Major effort on defining feature types for observation types so we can build an

OGC/ISO compatible data extractor for observations and numerical data.• Main thrust for Andrew Woolf and 0.5 New FTE• Ag Stephens contributing when time available

• Security Infrastructure Development– Collaboration with CCLRC e-science, ECOGrid and 0.5 FTE

• Ongoing work on metadata definition and population:– Oceanographic data

• Siva Kondapalli– Chemistry data

• Main thrust for Sue Latham– Numerical Modelling data

• DIF numerical definition (moving to ISO), BADC and UK Community• Katherine Bouton’s work at NCAS/CGAM

– Remote Sensing Data• Collaboration with NEODC and PML

• Ongoing work on databases and interfaces, DIF to ISO and “B”• Kevin O’Neill and Marta Gutierrez

Page 40: NERC DataGrid Status: ESP June 2004

British Atmospheric Data Centrehttp://badc.nerc.ac.uk

(B) Metadata Model

Activity

IncludesIncluded-in

IncludesIncluded-in

IncludesIncluded-in

Can-be-aggregated-in

ProducesOutput-by

Derived data entities

Observation stationTypes

Basic data entitiesDataset types

Dataproduction

tools

IncludesIncluded-in

Deploys-aDeployed-on-a

ProducesOutput-at

ProducesOutput-by

Common Data Entities- dimensions, * spatial/temporal- grids- organisations- people- places/areas

Data Granules

Page 41: NERC DataGrid Status: ESP June 2004

British Atmospheric Data Centrehttp://badc.nerc.ac.uk

(B) Metadata Model Overview

Tier-0Activity

IncludesIncluded-in

IncludesIncluded-in

IncludesIncluded-in

Inter-tier relationship

Directed relationship

Can-be-aggregated-in

Integrated-into

Produces/Output-by

Can-be-aggregated-in

Tier-1 -Observationstation Types

Common entities

Tier-4 - Basic dataentities

Tier-3 -Datasettypes

Tier-2 - Dataproductiontools

IncludesIncluded-in

CollectsCan-be-collected-in

Integrated-into

Deploys-aDeployed-on-a

Is-time-ordered-series-of

Follows-a

Superset-ofSubset-of

Processed-to-a

Instrument

Ensemble

Analysis

Stationary Moving

SectionProfileLagrangianpath

Grid

Time Trajectory

Point

Area

Place

Model

Simulation

Spatiotemporalentity

Entity withDIF record

Dataowningentity

Sample

Can-be-aggregated-in

Person

Organisation

Role

Tier-5 - Deriveddata entities TimeseriesClimatology

Measurement

IntegrationCan-be-aggregated-in

N-dimensionaldataset

Can-be-aggregated-in

Integrated-into

Integrated-into

ProducesOutput-by

ProducesOutput-by

Spatialdimensions

GIS/ISO Feature Types

Page 42: NERC DataGrid Status: ESP June 2004

British Atmospheric Data Centrehttp://badc.nerc.ac.uk

Dataset

Variables

Multidimensionalarray

... of other arrays

... or fromaggregated

storage

Rich spatiotemporalreferencing (standards-compliant: ISO19108, ISO19111)

(A) NDG Semantic Data Model