Top Banner
BADC, BODC, CCLRC, PML and SOC The NERC Metadata Gateway: a product of the NERC DataGrid + + + + +[ ]= Bryan Lawrence (on behalf of a big team)
33

The NERC Metadata Gateway: a product of the NERC DataGrid

Jan 17, 2016

Download

Documents

alair

The NERC Metadata Gateway: a product of the NERC DataGrid. Bryan Lawrence (on behalf of a big team). +. +. ]=. +[. +. +. BADC, BODC, CCLRC, PML and SOC. Introduction to NERC, the NERC Data Centres, and NCAS The NERC DataGrid Project Key Components: - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: The NERC Metadata Gateway:  a product of the NERC DataGrid

BADC, BODC, CCLRC, PML and SOC

The NERC Metadata Gateway: a product of the NERC DataGridThe NERC Metadata Gateway: a product of the NERC DataGrid

+ ++ + +[ ]=

Bryan Lawrence

(on behalf of a big team)

Page 2: The NERC Metadata Gateway:  a product of the NERC DataGrid

TECO-WIS, Nov 2006

Outline

• Introduction to NERC, the NERC Data Centres, and NCAS• The NERC DataGrid Project

– Key Components:• Data Tools, Data Discovery, {Access Control}

– NDG Information Environment• Key Standards Structures: the ISO Family• From CSML, {MOLES}, DIF to ISO19139 (NumSim)

• Distributed Content Search – Why we did it this way– Our Discovery Architecture

• NDG Discovery– Now … and – The Future – The “New NERC Metadata Gateway”

• ISO19139 Best Practice• Summary

Page 3: The NERC Metadata Gateway:  a product of the NERC DataGrid

TECO-WIS, Nov 2006

Some Introductions

• NERC: The Natural Environment Research Council– The major player in UK environmental research – Is both a funding agency, and a conglomeration of “centres”: internal “research” institutes,

• The British Oceanographic Data Centre (BODC) is part of one of the internal institutes.And external “collaborative” centres, which include:• The Plymouth Marine Laboratory• The National Oceanographic Centre, Southampton• The National Centre for Atmospheric Science, NCAS, mostly embedded in Universities, but

part of which is • the British Atmospheric Centre (BADC) which is embedded in the

• CCLRC: Council for the Central Laboratories of the Research Councils – Is about to be replaced by a new entity, which might be called the “Large Facilities Research

Council”

• NERC has seven discipline based designated data centres (including the BODC and BADC), and requires as much integration of data access as possible. – From discovery to utilisation, from genomics to ecology, from oceanography to atmospheric

science, from antarctic science to British geology …

Page 4: The NERC Metadata Gateway:  a product of the NERC DataGrid

TECO-WIS, Nov 2006

http://ndg.nerc.ac.uk

British Atmospheric Data Centre

British Oceanographic Data Centre

Complexity + Volume + Remote Access = Grid Challenge

NCAR

Page 5: The NERC Metadata Gateway:  a product of the NERC DataGrid

TECO-WIS, Nov 2006

If it’s not obvious

• Lots of organisations– Varying membership, and trust internally and between each

other is not consistent.

• Lots of priorities– Not all organisations are “about” data

• Different internal storage structures– Data stored in variety of databases and filesystems.– Some things well documented, but not automated– Some things automated, but information content is sparse …

• Integrating data access non-trivial

And none of that includes the important relationships with customers and collaborators!

Page 6: The NERC Metadata Gateway:  a product of the NERC DataGrid

TECO-WIS, Nov 2006

Key Components

Discovery Tools• Discovery Portal

– Metadata Search– Direct Links to Data and Services

Data Tools• Slice and Dice• Visualisation• Manipulation

Access Control• Systems are resource limited• Data may access may be restricted by license

Metadata Structures to support all the above

Page 7: The NERC Metadata Gateway:  a product of the NERC DataGrid

TECO-WIS, Nov 2006

Standards Landscape

Or two:• ISO TC211 Standards, e.g

– ISO 19101: Geographic information – Reference model– ISO 19103: Geographic information – Conceptual schema language– ISO 19107: Geographic information – Spatial schema– ISO 19108: Geographic information – Temporal schema– ISO 19109: Geographic information – Rules for application schema– ISO 19111: Geographic information – Spatial referencing by

coordinates– ISO 19115: Geographic information – Metadata

• Open Geospatial Consortium Specs– Geographic Markup Language, a toolkit for building data

descriptions– WMS, WCS, WFS, WPS: the Web (Map, Coverage, Feature,

and Processing) services.

Page 8: The NERC Metadata Gateway:  a product of the NERC DataGrid

TECO-WIS, Nov 2006

Standards

• ISO 19101: Geographic information – Reference model

A geospatial dataset…

…consists of features and related objects…

…in a defined logical

structure…

…delivered through

services…

…and described by metadata.

Page 9: The NERC Metadata Gateway:  a product of the NERC DataGrid

TECO-WIS, Nov 2006

Data Description Standards• Geographic ‘features’

– “abstraction of real world phenomena” [ISO 19101]

– Type or instance– Encapsulate important

semantics in universe of discourse

– “Something you can name”• Application schema

– Defines semantic content and logical structure

– ISO standards provide toolkit:

• spatial/temporal referencing

• geometry (1-, 2-, 3-D)• topology• dictionaries (phenomena,

units, etc.)– GML – canonical encoding

[from ISO 19109 “Geographic information – Rules for Application Schema”]

Page 10: The NERC Metadata Gateway:  a product of the NERC DataGrid

TECO-WIS, Nov 2006

CSML: Climate Science Modelling Language

• Fully Featured GML Application Schema, with extensions for– External binary data (Grib, netCDF etc)– Irregular Grids, “Proper” vertical coordinate systems (both

activities now on OGC and ISO standards tracks)

• V1.0 included seven feature types and provided only “data” modelling.

• V1.0 CSML tooling includes a scanner (creates CSML from netCDF files), and a parser (instantiates python objects which can be manipulated scientifically (based on the XML CSML documents).

Page 11: The NERC Metadata Gateway:  a product of the NERC DataGrid

TECO-WIS, Nov 2006

XM

L P

arser

SeeMyDENC

Data Dictionary

S52 Portrayal Library

SENC

MarineGML

(NDG) Feature

Types

XML

XML

XML

Biological Species

Chl-a from Satellite

ModelledHydrodynamics

XSLT

XSLT

XSLT

For each XSD (for the source data) there is an

XSLT to translate the data to the Feature

Types (FT) defined by CSML. The FT’s and

XSLT are maintained in a ‘MarineXML registry’ The FTs can then

be translated to equivalent FTs for

display in the ECDIS system

XSLT

Features in the source XSD must be present in

the data dictionary.

XSD

XSD

XSD

XML

XML

The result of the translation is an encoding that contains the

marine data in weakly typed (i.e. generic) Features

XSLT

XSLT

Phenomena in the XSD must have an associated

portrayal

ECDIS acts as an example client for

the data.

Data from different parts of the marine

community conforming to a variety of schema

(XSD)

MeasuredHydrodynamics

S-57v3 GML

XML

XSD

XML

XSD

Feature described using S-57v3.1Application

Schema can be imported and are equivalent to the same features in CSML’

Slide adapted from Kieran Millard (AUKEGGS, 2005)

MarineXML Testbed

Page 12: The NERC Metadata Gateway:  a product of the NERC DataGrid

TECO-WIS, Nov 2006

The Concept of re-using Features

Here structured XML is converted to plain ascii text in the form required for a numerical model

HTML warning service pages are generated ‘on the fly’XML can also be converted to SVG to display data graphically

Here the same XML is converted to the SENC format used in a proprietary tool for viewing electronic navigation charts.

All this requires agreement on standards

Slide adapted from Kieran Millard (AUKEGGS, 2005)

Page 13: The NERC Metadata Gateway:  a product of the NERC DataGrid

TECO-WIS, Nov 2006

CSML Round Tripping - 1

Managing semantics

UGAS

GML app schema

XML

<gml:featureMember> <NDGPointFeature gml:id="ICES_100"> <NDGPointDomain> <domainReference> <NDGPosition srsName="urn:EPSG:geographicCRS:4979" axisLabels="Lat Long" uomLabels="degree degree"> <location>55.25 6.5</location> </NDGPosition> </domainReference> </NDGPointDomain> <gml:rangeSet> <gml:DataBlock> <gml:rangeParameters> <gml:CompositeValue> <gml:valueComponents> <gml:measure uom="#tn"/> <gml:measure uom="#amount"/> <gml:measure uom="#gsm"/> </gml:valueComponents> </gml:CompositeValue> </gml:rangeParameters> <gml:tupleList>

GML dataset

instance

Class1

Class2

-End1

1

-End2

*

«datatype»DataType1

conceptual model

Conforms to

101010

New Dataset

Application

produces

parser

V1.0 (Python, Complete)

Page 14: The NERC Metadata Gateway:  a product of the NERC DataGrid

TECO-WIS, Nov 2006

CSML Round Tripping - 2

Managing data - 1

parser

V1.0V2 in development

<gml:featureMember> <NDGPointFeature gml:id="ICES_100"> <NDGPointDomain> <domainReference> <NDGPosition srsName="urn:EPSG:geographicCRS:4979" axisLabels="Lat Long" uomLabels="degree degree"> <location>55.25 6.5</location> </NDGPosition> </domainReference> </NDGPointDomain> <gml:rangeSet> <gml:DataBlock> <gml:rangeParameters> <gml:CompositeValue> <gml:valueComponents> <gml:measure uom="#tn"/> <gml:measure uom="#amount"/> <gml:measure uom="#gsm"/> </gml:valueComponents> </gml:CompositeValue> </gml:rangeParameters> <gml:tupleList>

GML dataset

scanner

V1.0V2 in development

GML app schema

XML

instance

101010

CF Dataset

Application

producesCF

Page 15: The NERC Metadata Gateway:  a product of the NERC DataGrid

TECO-WIS, Nov 2006

CSML2: Structure “Affords” Behaviour

cd ProfileSeriesFeature

Cov erage Types::ProfileSeriesCov erage

+/ domainSet: Profi leSeriesDomain+/ rangeSet: Record [0..*]

«FeatureType»ProfileSeriesFeature

+ location: GM_Point [0..1]+ time: TM_Instant [0..1]

AnyDefinition

«ObjectType»phenomenon::Phenomenon

CV_DiscreteCoverage

Discrete Cov erages::CV_DiscreteGridPointCov erage

+ find(DirectPosition*, Integer*) : Sequence<CV_GridPointValuePair>+ list() : Set<CV_GridPointValuePair>+ locate(DirectPosition*) : Set<CV_GridPointValuePair>+ point(CV_GridCoordinate*) : CV_GridPointValuePair

«type»FT types::ProfileSeriesType

+ location: GM_Point [0..1]+ time: TM_Instant [0..1]+ value: ProfileSeriesCoverage

+ extractPoint() : PointFeature+ extractPointSeries() : PointSeriesFeature+ extractProfile() : Profi leFeature+ extractProfileSeries() : Profi leSeriesFeature

+parameter

+value

«realize»«implement»

ISO 19123 coverage class

‘Affordance’ modelled with UML <<type>>

Moving beyond GML, but staying in the ISO Frame!

Page 16: The NERC Metadata Gateway:  a product of the NERC DataGrid

TECO-WIS, Nov 2006

CSML2: Related to new OGC Observations and Measurements Spec

cd O&M

Event

«FeatureType»observ ation::Observ ation

+ quality: DQ_DataQuality [0..1]+ responsible: CI_ResponsibleParty [0..1]+ result: Any

«Union»procedure::Procedure

+ procedureUse: ProcedureEvent+ standardProcedure: ProcedureSystem

AnyDefinition

«ObjectType»phenomenon::Phenomenon

«FeatureType»Feature Types::

ProfileSeriesFeature

+ location: GM_Point [0..1]+ time: TM_Instant [0..1]

CV_DiscreteGridPointCoverage

Cov erage Types::ProfileSeriesCov erage

+/ domainSet: ProfileSeriesDomain+/ rangeSet: Record [0..*]

+generatedObservation 0..*

+procedure 1

+observedProperty

1{Definition must be of a phenomenon that is a property of the featureOfInterest}

+parameter

+value

+result

+featureOfInterest

An Observation is an Event whose result is an estimate of the value of some Property of the Feature-of-interest, obtained using a specified Procedure

Page 17: The NERC Metadata Gateway:  a product of the NERC DataGrid

TECO-WIS, Nov 2006

Managing Data 2

101010

CF Dataset

<gml:featureMember> <NDGPointFeature gml:id="ICES_100"> <NDGPointDomain> <domainReference> <NDGPosition srsName="urn:EPSG:geographicCRS:4979" axisLabels="Lat Long" uomLabels="degree degree"> <location>55.25 6.5</location> </NDGPosition> </domainReference> </NDGPointDomain> <gml:rangeSet> <gml:DataBlock> <gml:rangeParameters> <gml:CompositeValue> <gml:valueComponents> <gml:measure uom="#tn"/> <gml:measure uom="#amount"/> <gml:measure uom="#gsm"/> </gml:valueComponents> </gml:CompositeValue> </gml:rangeParameters> <gml:tupleList>

GML dataset

scanner

XSLT

ISO19115

XMLPUBLISH

DECISIONPROCESSES

101010

CF Dataset

Define Dataset

Add Information

Page 18: The NERC Metadata Gateway:  a product of the NERC DataGrid

TECO-WIS, Nov 2006

The Most Important Decision

What is a dataset?

Granularity too coarse: can’t find what you want – not enough information exposed.

Granularity too fine: can’t find what you want – buried in unordered results.

Page 19: The NERC Metadata Gateway:  a product of the NERC DataGrid

TECO-WIS, Nov 2006

Distributed Query

Options:• Harvest or Crawl• Distribute Query to known targets versus

harvest from known targets and do local query– Timeliness versus Responsiveness

Decision:• NDG Discovery based on Open Archives

Initiative Protocol for Metadata Harvesting– Additional Partners include NCAR, MPI-WDCC, TPAC,

UK-MDIP

Page 20: The NERC Metadata Gateway:  a product of the NERC DataGrid

TECO-WIS, Nov 2006

Discovery Metadata Usage

Local Discoverymetadata inOAI provider

NDGDiscoverymetadata

store

Query WSinterface

OA

I Ha

rvestin

g

Local metadatastore

Local Discoverymetadata inOAI provider

Local metadatastore

Query

Results

DataCentre 1

DataCentre 2

Generate

Generate

Portal 1GUI

Portal 2GUI

Feeds

Feeds

XML: Metadata store: can support a limited variety of different xml schema provided WS-interface understands them (need unique xquery for each method, schema pair)

Page 21: The NERC Metadata Gateway:  a product of the NERC DataGrid

TECO-WIS, Nov 2006

Metadata Formats

Currently Supporting• NASA Global Change Master Directory: Directory Interchange Format (DIF)

Experimenting with:• Vanilla ISO19139• Dublin Core• UK Gemini V1 format

Will support following ISO profiles for harvest:• (eventually) UK Gemini profile• WMO profile• IOC profile• (whenever) US FGDC profile

ALL SIMULTANEOUSLY: XML Database plus appropriate xqueries

Page 22: The NERC Metadata Gateway:  a product of the NERC DataGrid

TECO-WIS, Nov 2006

Simulation in the context of ISO19139: NumSim

NDG Products: NumSim

Page 23: The NERC Metadata Gateway:  a product of the NERC DataGrid

TECO-WIS, Nov 2006

NumSim Example

NumSim Example

Page 24: The NERC Metadata Gateway:  a product of the NERC DataGrid

TECO-WIS, Nov 2006

Firefox Search Plugin

Page 25: The NERC Metadata Gateway:  a product of the NERC DataGrid

TECO-WIS, Nov 2006

International Discovery - Climate

Page 26: The NERC Metadata Gateway:  a product of the NERC DataGrid

TECO-WIS, Nov 2006

NDG “New Interface”

Page 27: The NERC Metadata Gateway:  a product of the NERC DataGrid

TECO-WIS, Nov 2006

Within Record

Scrolling Down

Page 28: The NERC Metadata Gateway:  a product of the NERC DataGrid

TECO-WIS, Nov 2006

New Interfaces

(No CSS as yet)

Simple Advanced

Issues:

• Times (forecast, paleo etc)

• BBOX (near poles and dateline)

• Semantic Vocabulary matching (exploiting a new NDG web-service providing thesaurus content, and ontology mapping)

Page 29: The NERC Metadata Gateway:  a product of the NERC DataGrid

TECO-WIS, Nov 2006

• Metadata extensions and profiles

ISO

Page 30: The NERC Metadata Gateway:  a product of the NERC DataGrid

TECO-WIS, Nov 2006

ISO19139

Background:• Designed to exploit as much as possible of the xml-

schema machinery• Not designed for Humans!

Advice:• Use in conjunction with a clear concept of why it’s being

used: • Decide on dataset granularity, and use other metadata

schema to describe how to use content (“A” metadata; e.g. an application schema of GML).

• Devise a profile with utility then: restrict, restrict, restrict. Document. Register.

Page 31: The NERC Metadata Gateway:  a product of the NERC DataGrid

TECO-WIS, Nov 2006

On Restriction

ISO19139 is also about INTEROPERABILITY!

• Don’t follow the ISO19139 advice and produce a new schema!

• Ensure that your profile instances are valid vanilla ISO19139

• Restrict content out-of-band, e.g. schematron, etc.

• Agree on how you’re going to deploy ISO19139

Page 32: The NERC Metadata Gateway:  a product of the NERC DataGrid

TECO-WIS, Nov 2006

On Extension

ISO19139 is also about INTEROPERABILITY!

Do follow the ISO19139 advice and produce a new schema!

• Do what you need for your community, but:

• Design so that code expecting ISO19139 instances can parse yours!

• Make it easy for third party code to ignore your content!

Page 33: The NERC Metadata Gateway:  a product of the NERC DataGrid

TECO-WIS, Nov 2006

Summary

• NDG dealing with heterogeneous environment

• Successful deployment of OAI with discovery metadata

(There are some issues differentiating between model simulations and ordering response sets)

• Directly linking to and exploiting GML application schema

• Web Service backends make deployment easier.

• Communities need to be very careful how they deploy ISO19139