Top Banner
1 Data integration with the Climate Science Modelling Language Andrew Woolf 1 , Bryan Lawrence 2 , Roy Lowry 3 , Kerstin Kleese van Dam 1 , Ray Cramer 3 , Marta Gutierrez 2 , Siva Kondapalli 3 , Susan Latham 2 , Dominic Lowe 2 , Kevin O’Neill 1 , Ag Stephens 2 1 CCLRC e-Science Centre 2 British Atmospheric Data Centre 3 British Oceanographic Data Centre
20

1 NESC workshop Grid and Geospatial Standards 7-Sep-2005 Data integration with the Climate Science Modelling Language Andrew Woolf 1, Bryan Lawrence 2,

Jan 12, 2016

Download

Documents

Claude Banks
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 1 NESC workshop Grid and Geospatial Standards 7-Sep-2005 Data integration with the Climate Science Modelling Language Andrew Woolf 1, Bryan Lawrence 2,

1

Data integration with theClimate Science Modelling Language

Andrew Woolf1, Bryan Lawrence2, Roy Lowry3, Kerstin Kleese van Dam1, Ray Cramer3, Marta Gutierrez2, Siva

Kondapalli3, Susan Latham2, Dominic Lowe2, Kevin O’Neill1, Ag Stephens2

1CCLRC e-Science Centre2British Atmospheric Data Centre

3British Oceanographic Data Centre

Page 2: 1 NESC workshop Grid and Geospatial Standards 7-Sep-2005 Data integration with the Climate Science Modelling Language Andrew Woolf 1, Bryan Lawrence 2,

2

Outline

Background

Standards – a framework for interoperability

Climate Science Modelling Language (CSML)

Using CSML

Page 3: 1 NESC workshop Grid and Geospatial Standards 7-Sep-2005 Data integration with the Climate Science Modelling Language Andrew Woolf 1, Bryan Lawrence 2,

3

Background

Data integration requirements:• scalability across providers• warehousing not an option• enhance access and use, ‘outwards-facing’ (e.g. impacts

community, policymakers)• storage heterogeneity

Semantics as integration ‘key’• common language across providers (and users)• supports wrapper/mediator architecture

Page 4: 1 NESC workshop Grid and Geospatial Standards 7-Sep-2005 Data integration with the Climate Science Modelling Language Andrew Woolf 1, Bryan Lawrence 2,

4

Standards

Emerging ISO standards• TC211 – around 40 standards for geographic information• Cover activity spectrum: discovery access use

ISO 19101Domain

Reference Model

A geospatial dataset…

…consists of features and related objects…

…in a defined logical structure…

…delivered through services…

…and described by metadata.

Page 5: 1 NESC workshop Grid and Geospatial Standards 7-Sep-2005 Data integration with the Climate Science Modelling Language Andrew Woolf 1, Bryan Lawrence 2,

5

Standards

Geographic ‘features’• “abstraction of real world

phenomena” [ISO 19101]• Type or instance• Encapsulate important

semantics in universe of discourse

Application schema• Defines semantic content and

logical structure of datasets• ISO standards provide toolkit:

• spatial/temporal referencing

• geometry (1-, 2-, 3-D)• topology• dictionaries (phenomena,

units, etc.)• GML – canonical encoding

[from ISO 19109 “Geographic information – Rules for Application Schema”]

Page 6: 1 NESC workshop Grid and Geospatial Standards 7-Sep-2005 Data integration with the Climate Science Modelling Language Andrew Woolf 1, Bryan Lawrence 2,

6

Standards

The importance of governance• Information community defined by shared semantics

• Need community process to manage those semantics (definitions, models, vocabularies, taxonomies, etc.)

• e.g. CF conventions for netCDF files

• Role of Feature Type Catalogues [ISO 19110] and registers [ISO 19135]

Governance as driver for granularity• Remit / interest determines appropriate granularity

• e.g. IOC, IHO, WMO

abstract generic highly specialised

feature types spectrum

<temperatureProfile/><measurement type=“Radiosonde” measurand=“temperature”/> <Sonde parameter=“temperature”/>

Page 7: 1 NESC workshop Grid and Geospatial Standards 7-Sep-2005 Data integration with the Climate Science Modelling Language Andrew Woolf 1, Bryan Lawrence 2,

7

Climate ScienceModelling Language

Aims:• provide semantic integration mechanism for NDG data• explore new standards-based interoperability framework• emphasise content, not container

Design principles:• offload semantics onto parameter type (‘phenomenon’, observable,

measurand)• e.g. wind-profiler, balloon temperature sounding

• offload semantics onto CRS• e.g. scanning radar, sounding radar

• ‘sensible plotting’ as discriminant• ‘in-principle’ unsupervised portrayal

• explicitly aim for small number of weakly-typed features (in accordance with governance principle and NDG remit)

Page 8: 1 NESC workshop Grid and Geospatial Standards 7-Sep-2005 Data integration with the Climate Science Modelling Language Andrew Woolf 1, Bryan Lawrence 2,

8

Climate ScienceModelling Language

CSML feature types• defined on basis of geometric and topologic structure

CSML feature type Description Examples

TrajectoryFeature Discrete path in time and space of a platform or instrument. ship’s cruise track, aircraft’s flight path

PointFeature Single point measurement. raingauge measurement

ProfileFeature Single ‘profile’ of some parameter along a directed line in space.

wind sounding, XBT, CTD, radiosonde

GridFeature Single time-snapshot of a gridded field. gridded analysis field

PointSeriesFeature Series of single datum measurements. tidegauge, rainfall timeseries

ProfileSeriesFeature Series of profile-type measurements.

vertical or scanning radar, shipborne ADCP, thermistor chain timeseries

GridSeriesFeature Timeseries of gridded parameter fields.numerical weather prediction model, ocean general circulation model

Page 9: 1 NESC workshop Grid and Geospatial Standards 7-Sep-2005 Data integration with the Climate Science Modelling Language Andrew Woolf 1, Bryan Lawrence 2,

9

Climate ScienceModelling Language

CSML feature types• examples...

ProfileSeriesFeature

ProfileFeature

GridFeature

Page 10: 1 NESC workshop Grid and Geospatial Standards 7-Sep-2005 Data integration with the Climate Science Modelling Language Andrew Woolf 1, Bryan Lawrence 2,

10

Climate ScienceModelling Language

Application schema• logical structure and semantic content of NDG ‘Dataset’• Based on GML 3.1

«Type»GML::AbstractGMLType

«Type»Dataset

«Type»UnitDefinitions

«Type»ReferenceSystemDefinitions

«Type»PhenomenonDefinitions

«Type»AbstractArrayDescriptor

«Type»GML::FeatureCollection

**

*

*

*

*

*

*

*

*

Page 11: 1 NESC workshop Grid and Geospatial Standards 7-Sep-2005 Data integration with the Climate Science Modelling Language Andrew Woolf 1, Bryan Lawrence 2,

11

Climate ScienceModelling Language

Integration approaches: wrapper/mediator

BADCBODC

wrapper wrapper

mediatorServicedataInterface

mediatorServicedataInterface

Page 12: 1 NESC workshop Grid and Geospatial Standards 7-Sep-2005 Data integration with the Climate Science Modelling Language Andrew Woolf 1, Bryan Lawrence 2,

12

Climate ScienceModelling Language

Numerical array descriptors• provides ‘wrapper’ architecture

for legacy data files• ‘Connected’ to data model

numerical content through ‘xlink:href’

Three subtypes:• InlineArray• ArrayGenerator• FileExtract (NASAAmes,

NetCDF, GRIB)

Composite design pattern for aggregation

+arraySize[1]+uom[0..1]+numericType[0..1]+numericTransform[0..1]+regExpTransform[0..1]

«Type»AbstractArrayDescriptor

+aggType[1]+aggIndex[1]

«Type»AggregatedArray

1

+component

*

+values[*]

«Type»InlineArray

+expression[1]

«Type»ArrayGenerator

+fileName[1]

«Type»AbstractFileExtract

+variableName[1]+index[0..1]

«Type»NASAAmesExtract

+variableName[1]

«Type»NetCDFExtract

+parameterCode[1]+recordNumber[0..1]+fileOffset[0..1]

«Type»GRIBExtract

+id+metaDataProperty+description+name

«Type»GML::AbstractGMLType

Page 13: 1 NESC workshop Grid and Geospatial Standards 7-Sep-2005 Data integration with the Climate Science Modelling Language Andrew Woolf 1, Bryan Lawrence 2,

13

Climate ScienceModelling Language

Inline array

Array generator

<NDGInlineArray><arraySize>5 2</arraySize><uom>udunits.xml#degreeC</uom><numericType>float</numericType><regExpTransform>s/10/9/ge</regExpTransform><numericTransform>+5</numericTransform><values>1 2 3 4 5 6 7 8 9 10</values>

</NDGInlineArray>

<NDGArrayGenerator><arraySize>10001</arraySize><uom>udunits.xml#minute</uom><numericType>float</numericType><expression>0:5:50000</expression>

</NDGArrayGenerator>

Page 14: 1 NESC workshop Grid and Geospatial Standards 7-Sep-2005 Data integration with the Climate Science Modelling Language Andrew Woolf 1, Bryan Lawrence 2,

14

Climate ScienceModelling Language

File extract

<NDGNASAAmesExtract><arraySize>526</arraySize><numericType>double</numericType><fileName>/data/BADC/macehead/mh960606.cf1</fileName><variableName>CFC-12</variableName>

</NDGNASAAmesExtract>

<NDGNetCDFExtract gml:id="feat04azimuth"><arraySize>10000</arraySize><fileName>radar_data.nc</fileName><variableName>az</variableName>

</NDGNetCDFExtract>

<NDGGRIBExtract><arraySize>320 160</arraySize><numericType>double</numericType><fileName>/e40/ggas1992010100rsn.grb</fileName><parameterCode>203</parameterCode><recordNumber>5</ recordNumber><fileOffset>289412</fileOffset>

</NDGGRIBExtract>

Page 15: 1 NESC workshop Grid and Geospatial Standards 7-Sep-2005 Data integration with the Climate Science Modelling Language Andrew Woolf 1, Bryan Lawrence 2,

15

Climate ScienceModelling Language

Aggregated array• arrays may be aggregated along an ‘existing’ or ‘new’ dimension

<NDGAggregatedArray gml:id="feat05cruisetrack"><arraySize>2 50</arraySize><aggType>new</aggType><aggIndex>1</aggIndex><component>

<NDGNetCDFExtract><arraySize>50</arraySize><fileName>cruisetrack.nc</fileName><variableName>alat</variableName>

</NDGNetCDFExtract></component><component>

<NDGNetCDFExtract><arraySize>50</arraySize><fileName>cruisetrack.nc</fileName><variableName>alon</variableName>

</NDGNetCDFExtract></component>

</NDGAggregatedArray>

Page 16: 1 NESC workshop Grid and Geospatial Standards 7-Sep-2005 Data integration with the Climate Science Modelling Language Andrew Woolf 1, Bryan Lawrence 2,

16

Climate ScienceModelling Language

Provides semantic abstraction layer

instantiateNetCDF(DatasetID, FeatureID)

<CSML>+writeNetCDF()

CSMLAbstractFeature(SAX) demarshalling

+read()

AbstractFileExtract

<CSML>

filestore

<CSML>

NetCDF

WCS

WFS

OPeNDAP

....

<CSML>

Page 17: 1 NESC workshop Grid and Geospatial Standards 7-Sep-2005 Data integration with the Climate Science Modelling Language Andrew Woolf 1, Bryan Lawrence 2,

17

Climate ScienceModelling Language

Status:• Initial feature types defined

• First draft application schema complete

• Trial software tooling being coded (parser, netCDF instantiation)

• Initial deployment trial across BODC, BADC datasets

Future:• Separate out wrapper implementation (array descriptors)

• Disallow ‘internal’ dictionaries

• More strongly-typed features?

• Follow (and pursue!) GML evolution, enhance compliance

• Expand tooling

Related work• WMO, IOC, IHO

• MarineXML

• MOTIIVE (INSPIRE)

Page 18: 1 NESC workshop Grid and Geospatial Standards 7-Sep-2005 Data integration with the Climate Science Modelling Language Andrew Woolf 1, Bryan Lawrence 2,

18

“MarineXML is an initiative of the IOC/IODE of UNESCO to improve marine data exchange within the marine community. The European Commission has provided a funding contribution to this initiative as part of its 5th Framework Programme to undertake a ‘pre-standardisation’ task of identifying the approaches the marine community should adopt regarding XML technology to achieve improved data exchange.”

EU project – MarineXML

<gml:definitionMember> <om:Phenomenon gml:id="taxon"> <gml:description>The taxon name</gml:description> <gml:name codeSpace="http://www.vliz.be">taxon</gml:name> </om:Phenomenon> </gml:definitionMember> </NDGPhenomenonDefinitions> <!--===================================================================--> <gml:FeatureCollection> <!-- ============================================================== --> <gml:featureMember> <NDGPointFeature gml:id="ICES_100"> <NDGPointDomain> <domainReference> <NDGPosition srsName="urn:EPSG:geographicCRS:4979" axisLabels="Lat Long" uomLabels="degree degree"> <location>55.25 6.5</location> </NDGPosition> </domainReference> </NDGPointDomain> <gml:rangeSet> <gml:DataBlock> <gml:rangeParameters> <gml:CompositeValue> <gml:valueComponents> <gml:measure uom="#tn"/> <gml:measure uom="#amount"/> <gml:measure uom="#gsm"/> </gml:valueComponents> </gml:CompositeValue> </gml:rangeParameters> <gml:tupleList> 'ANTHOZOA',63.1,missing 'Scoloplos armiger',66.1,missing 'Spio filicornis',10,missing 'Spiophanes bombyx',60.3,missing 'Capitellidae',131.8,missing 'Pholoe',10,missing 'Owenia fusiformis',23.4,missing 'Hypereteone lactea',6.8,missing 'Anaitides groenlandica',13.2,missing 'Anaitides mucosa',6.8,missing

“... there is a momentum from organisations such as IHO and WMO to adopt consistent approaches for the vocabulary of their data along the reference implementation of ISO Standards prescribed by the [Open Geospatial Consortium]...”

“The NDG format proved a robust recipient for the data from each community. It produced economical files with few redundant elements, striking about the right balance between weak and strong typing.”

Using CSML

Page 19: 1 NESC workshop Grid and Geospatial Standards 7-Sep-2005 Data integration with the Climate Science Modelling Language Andrew Woolf 1, Bryan Lawrence 2,

19

Using CSML

Class1

Class2

-End1

1

-End2

*

«datatype»DataType1

conceptual model

GML app schema <gml:featureMember> <NDGPointFeature gml:id="ICES_100"> <NDGPointDomain> <domainReference> <NDGPosition srsName="urn:EPSG:geographicCRS:4979" axisLabels="Lat Long" uomLabels="degree degree"> <location>55.25 6.5</location> </NDGPosition> </domainReference> </NDGPointDomain> <gml:rangeSet> <gml:DataBlock> <gml:rangeParameters> <gml:CompositeValue> <gml:valueComponents> <gml:measure uom="#tn"/> <gml:measure uom="#amount"/> <gml:measure uom="#gsm"/> </gml:valueComponents> </gml:CompositeValue> </gml:rangeParameters> <gml:tupleList> 'ANTHOZOA',63.1,missing 'Scoloplos armiger',66.1,missing 'Spio filicornis',10,missing 'Spiophanes bombyx',60.3,missing 'Capitellidae',131.8,missing

GML dataset

UGAS

parser

Managing semantics

Page 20: 1 NESC workshop Grid and Geospatial Standards 7-Sep-2005 Data integration with the Climate Science Modelling Language Andrew Woolf 1, Bryan Lawrence 2,

20

Using CSML

Stack of Builders (for UML meta-model)

• current class, object, attribute• specialised for particular

UMLXML mapping

Builder receives:• filtered SAX events• built object

Builder returns:• built object• new object class• new Builder (for inheritance

through substitutionGroups)

SAX2::DefaultHandler

+notationDecl()+unparsedEntity()

«interface»SAX2::DTDHandler

+setDocumentLocator()+startDocument()+endDocument()+startElement()+endElement()+characters()+processingInstruction()+ignorableWhitespace()+startPrefixMapping()+endPrefixMapping()+skippedEntity()

«interface»SAX2::ContentHandler

+error()+fatalError()+warning()

«interface»SAX2::ErrorHandler

+resolveEntity()

«interface»SAX2::EntityResolver

-process()

ObjectParser

+xmlElement() : BuilderUnion+xmlCharacters() : BuilderUnion+xmlAttribute() : BuilderUnion+processObject() : BuilderUnion

-currentClass-currentObject-currentAttribute

ObjectBuilderbuilderStack

*

ObjectPropertyBuilder ISO19118Builder

AbstractClassAbstractObjectAbstractBuilder

«union»BuilderUnion