Top Banner
Spatiotemporal Databases e-Science Institute, Edinburgh 01-Nov-2005 1 Andrew Woolf 1 ([email protected] ), Ray Cramer 2 , Marta Gutierrez 3 , Kerstin Kleese van Dam 1 , Siva Kondapalli 2 , Susan Latham 3 , Bryan Lawrence 3 , Roy Lowry 2 , Kevin O’Neill 1 , Ag Stephens 3 1 CCLRC e-Science Centre 2 British Oceanographic Data Centre 3 British Atmospheric Data Centre NERC DataGrid data model and its application
27

NERC DataGrid data model and its application

Dec 31, 2015

Download

Documents

shaine-burris

NERC DataGrid data model and its application. Andrew Woolf 1 ( [email protected] ), Ray Cramer 2 , Marta Gutierrez 3 , Kerstin Kleese van Dam 1 , Siva Kondapalli 2 , Susan Latham 3 , Bryan Lawrence 3 , Roy Lowry 2 , Kevin O’Neill 1 , Ag Stephens 3 1 CCLRC e-Science Centre - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: NERC DataGrid data model and its application

Spatiotemporal Databasese-Science Institute, Edinburgh

01-Nov-2005

1

Andrew Woolf1 ([email protected]), Ray Cramer2, Marta Gutierrez3, Kerstin Kleese van Dam1, Siva

Kondapalli2, Susan Latham3, Bryan Lawrence3, Roy Lowry2, Kevin O’Neill1, Ag Stephens3

1 CCLRC e-Science Centre2 British Oceanographic Data Centre

3 British Atmospheric Data Centre

NERC DataGrid data model and its application

Page 2: NERC DataGrid data model and its application

Spatiotemporal Databasese-Science Institute, Edinburgh

01-Nov-2005

2

Outline

NERC DataGrid – data integration problem

Semantics as integration key

CSML

Wrapper/mediator architecture

Use and future

Page 3: NERC DataGrid data model and its application

Spatiotemporal Databasese-Science Institute, Edinburgh

01-Nov-2005

3

NERC DataGrid

British Atmospheric Data Centre

British Oceanographic Data Centre

Simulations

Assimilation

Page 4: NERC DataGrid data model and its application

Spatiotemporal Databasese-Science Institute, Edinburgh

01-Nov-2005

4

NDG data integration

Most (but not all) NDG data is file-based…On the Grid, no-one should know if you’re a file or

relational table… (one service to bind them all)

The file problem• multiple formats• focus usually on container, not content

Scientific file format examples (earth sciences):• netCDF• HDF4• HDF5• GRIB• NASA Ames• ...

Page 5: NERC DataGrid data model and its application

Spatiotemporal Databasese-Science Institute, Edinburgh

01-Nov-2005

5

NDG data integration

Page 6: NERC DataGrid data model and its application

Spatiotemporal Databasese-Science Institute, Edinburgh

01-Nov-2005

6

NDG data integration

Typically, API is fundamental point of reference• binary format details not always exposed (or guaranteed)• public API often the only supported access mechanism• API typically implemented as optimised native library• why reinvent a well-known working interface?

Data Format Description Language (DFDL)• XML ‘facade’ to file formats• earth science files often giga-scale XML query interface not likely

to be efficient• encapsulating format not the issue for NDG...• ...integrating domain-specific semantics efficiently across files and

formats is!

Page 7: NERC DataGrid data model and its application

Spatiotemporal Databasese-Science Institute, Edinburgh

01-Nov-2005

7

NDG data integration

Information and file contents• same information in different file formats – want to expose

information, not format (seen earlier)

• in addition, semantic information structures may be composed across files

Page 8: NERC DataGrid data model and its application

Spatiotemporal Databasese-Science Institute, Edinburgh

01-Nov-2005

8

Integration – semantics

Want semantic access to information, not abstract data• getData(potential temperature from ERA-40 dataset

in North Atlantic from 1990 to 2000)• not: getData(“era40.nc”, ‘PTMP’, 20:50, 300:340,

190:200)• or even worse:

for j=1990:2000

getData(“era40_”+j+“.nc”, ‘PTMP’, 20:50, 300:340)

Lossy is OK!• Care less about completeness of representation than semantic

unification

Page 9: NERC DataGrid data model and its application

Spatiotemporal Databasese-Science Institute, Edinburgh

01-Nov-2005

9

NDG data integration

Integration approaches: warehousing

NDG

BADCBODC

form

at co

nver

sion form

at conversion

BADCBODC

wrapper wrapper

mediatorServicedataInterface

mediatorServicedataInterface

Integration approaches: wrapper/mediator

Page 10: NERC DataGrid data model and its application

Spatiotemporal Databasese-Science Institute, Edinburgh

01-Nov-2005

10

Integration – semantics

Summary:

What we require is• semantic access to information (within and across files);

• and to use native (well-known) efficient APIs under the covers

also:• scalability across providers

• warehousing not an option (tera-scale!)

• enhance access and use, ‘outwards-facing’ (e.g. impacts community, policymakers)

• storage heterogeneity

Page 11: NERC DataGrid data model and its application

Spatiotemporal Databasese-Science Institute, Edinburgh

01-Nov-2005

11

Integration – semantics

Database data modelling• Relational model (Codd, 1970)

• Entity-relationship model (Chen, 1976)

• Semantic data models

• Object-oriented data models (inheritance, aggregation, behaviour)

File-based data modelling• Far less advanced

• Abstract models (‘variables’, ‘arrays’, etc.; no ‘object’ file formats in widespread use for earth science data)

• API-driven

Page 12: NERC DataGrid data model and its application

Spatiotemporal Databasese-Science Institute, Edinburgh

01-Nov-2005

12

Integration – semantics

Fundamentally, an information community is defined by shared semantics• semantics often (but not always) implicit• use information semantics for data integration

Semantics as integration ‘key’• common language across providers (and users)• supports wrapper/mediator architecture

NDG Solution components:• semantic data model (Climate Science Modelling Language)• storage descriptor (wrapper)• data services (mediator)

Page 13: NERC DataGrid data model and its application

Spatiotemporal Databasese-Science Institute, Edinburgh

01-Nov-2005

13

CSML

Geographic ‘features’• “abstraction of real world

phenomena” [ISO 19101]• Object models for data types – type

or instance• Encapsulate important semantics in

universe of discourse

Application schema• Defines semantic content and

logical structure of datasets• ISO standards provide conceptual

toolkit:• spatial/temporal referencing• geometry (1-, 2-, 3-D)• topology• dictionaries (phenomena, units,

etc.)• GML – canonical encoding

[from ISO 19109 “Geographic information – Rules for Application Schema”]

Page 14: NERC DataGrid data model and its application

Spatiotemporal Databasese-Science Institute, Edinburgh

01-Nov-2005

14

CSML

CSML aims:• provide semantic integration mechanism for NDG data• explore new standards-based interoperability framework• emphasise content, not container

Design principles:• offload semantics onto parameter type (‘phenomenon’, observable,

measurand)• e.g. wind-profiler, balloon temperature sounding

• offload semantics onto CRS• e.g. scanning radar, sounding radar

• ‘sensible plotting’ as discriminant• ‘in-principle’ unsupervised portrayal

• explicitly aim for small number of weakly-typed features (in accordance with governance principle and NDG remit)

Page 15: NERC DataGrid data model and its application

Spatiotemporal Databasese-Science Institute, Edinburgh

01-Nov-2005

15

CSML

Semantic data model• Climate Science Modelling Language (CSML),

http://ndg.nerc.ac.uk/csml• Weakly-typed conceptual models for range of information types• Independent of storage concerns• Based on ISO ‘geographic feature types’ framework• Defined on basis of geometric and topologic structure

CSML feature type Description Examples

TrajectoryFeature Discrete path in time and space of a platform or instrument.

ship’s cruise track, aircraft’s flight path

PointFeature Single point measurement. raingauge measurement

ProfileFeature Single ‘profile’ of some parameter along a directed line in space.

wind sounding, XBT, CTD, radiosonde

GridFeature Single time-snapshot of a gridded field. gridded analysis fieldPointSeriesFeature Series of single datum measurements. tidegauge, rainfall timeseries

ProfileSeriesFeature Series of profile-type measurements.

vertical or scanning radar, shipborne ADCP, thermistor chain timeseries

GridSeriesFeature Timeseries of gridded parameter fields.numerical weather prediction model, ocean general circulation model

Page 16: NERC DataGrid data model and its application

Spatiotemporal Databasese-Science Institute, Edinburgh

01-Nov-2005

16

CSML

CSML feature type examples:ProfileSeriesFeature

ProfileFeature

GridFeature

Page 17: NERC DataGrid data model and its application

Spatiotemporal Databasese-Science Institute, Edinburgh

01-Nov-2005

17

Wrapper

Numerical array descriptors• provides ‘wrapper’ architecture

for legacy data files• proxy for numerical content

within feature instances• ‘Connected’ to data model

numerical content through ‘xlink:href’

Three subtypes:• InlineArray• ArrayGenerator• FileExtract (NASAAmes,

NetCDF, GRIB)

Composite design pattern for aggregation

+arraySize[1]+uom[0..1]+numericType[0..1]+numericTransform[0..1]+regExpTransform[0..1]

«Type»AbstractArrayDescriptor

+aggType[1]+aggIndex[1]

«Type»AggregatedArray

1

+component

*

+values[*]

«Type»InlineArray

+expression[1]

«Type»ArrayGenerator

+fileName[1]

«Type»AbstractFileExtract

+variableName[1]+index[0..1]

«Type»NASAAmesExtract

+variableName[1]

«Type»NetCDFExtract

+parameterCode[1]+recordNumber[0..1]+fileOffset[0..1]

«Type»GRIBExtract

+id+metaDataProperty+description+name

«Type»GML::AbstractGMLType

Page 18: NERC DataGrid data model and its application

Spatiotemporal Databasese-Science Institute, Edinburgh

01-Nov-2005

18

Wrapper

File extract examples:

<NDGNASAAmesExtract><arraySize>526</arraySize><numericType>double</numericType><fileName>/data/BADC/macehead/mh960606.cf1</fileName><variableName>CFC-12</variableName>

</NDGNASAAmesExtract><NDGNetCDFExtract gml:id="feat04azimuth">

<arraySize>10000</arraySize><fileName>radar_data.nc</fileName><variableName>az</variableName>

</NDGNetCDFExtract><NDGGRIBExtract><arraySize>320 160</arraySize><numericType>double</numericType><fileName>/e40/ggas1992010100rsn.grb</fileName><parameterCode>203</parameterCode><recordNumber>5</ recordNumber><fileOffset>289412</fileOffset>

</NDGGRIBExtract>

Page 19: NERC DataGrid data model and its application

Spatiotemporal Databasese-Science Institute, Edinburgh

01-Nov-2005

19

Wrapper

Aggregated array• arrays may be aggregated along an ‘existing’ or ‘new’ dimension

<AggregatedArray gml:id="globaltemperature"> <arraySize>180 360</arraySize> <aggType>existing</aggType> <aggIndex>1</aggIndex> <component> <NetCDFExtract> <arraySize>90 360</arraySize> <fileName>northern_hemisphere.nc</fileName> <variableName>TMP</variableName> </NetCDFExtract> </component> <component> <NetCDFExtract> <arraySize>90 360</arraySize> <fileName>southern_hemisphere.nc</fileName> <variableName>TMP</variableName> </NetCDFExtract> </component></AggregatedArray>

Page 20: NERC DataGrid data model and its application

Spatiotemporal Databasese-Science Institute, Edinburgh

01-Nov-2005

20

Mediator

Data services (mediator)• Data services expose semantic model:

• Mappings to third-party data models (e.g. file formats, OPeNDAP)

• Canonical serialisation (e.g. ISO 19118 UML XML mapping) – Geography Markup Language

• Example services:

• netCDF file instantiation

• OPeNDAP delivery

• Open Geospatial Consortium (OGC) web services, e.g. Web Feature Service, Web Coverage Service

• Pushed down to the file level, data access request should use optimised native file format-specific I/O

Page 21: NERC DataGrid data model and its application

Spatiotemporal Databasese-Science Institute, Edinburgh

01-Nov-2005

21

Mediator

Provides semantic abstraction layer

instantiateNetCDF(DatasetID, FeatureID)

<CSML>+writeNetCDF()

CSMLAbstractFeature(SAX) demarshalling

+read()

AbstractFileExtract

<CSML>

filestore

<CSML>

NetCDF

WCS

WFS

OPeNDAP

....

<CSML>

Page 22: NERC DataGrid data model and its application

Spatiotemporal Databasese-Science Institute, Edinburgh

01-Nov-2005

22

Using CSMLExample of CSML use – MarineXML

SeeMyDENC

Data Dictionary

S52 Portrayal Library

SENC

MarineGML

(NDG) Feature

Types

XM

L P

arser

XML

XML

XML

XML

Biological Species

Chl-a from Satellite

MeasuredHydrodynamics

ModelledHydrodynamics

XSLT

XSLT

XSLT

For each XSD (for the source data) there is an

XSLT to translate the data to the Feature

Types (FT) defined by CSML. The FT’s and

XSLT are maintained in a ‘MarineXML registry’ The FTs can then

be translated to equivalent FTs for

display in the ECDIS system

XSLT

Features in the source XSD must be present in

the data dictionary.

XSD

XSD

XSD

XSD

XML

XML

The result of the translation is an encoding that contains the

marine data in weakly typed (i.e. generic) Features

XSLT

XSLT

Phenomena in the XSD must have an associated

portrayal

ECDIS acts as an example client for

the data.

Data from different parts of the marine

community conforming to a variety of schema

(XSD)

with thanks to Keiran Millard, HR Wallingford

Page 23: NERC DataGrid data model and its application

Spatiotemporal Databasese-Science Institute, Edinburgh

01-Nov-2005

23

“MarineXML is an initiative of the IOC/IODE of UNESCO to improve marine data exchange within the marine community. The European Commission has provided a funding contribution to this initiative as part of its 5th Framework Programme to undertake a ‘pre-standardisation’ task of identifying the approaches the marine community should adopt regarding XML technology to achieve improved data exchange.”

EU project – MarineXML

<gml:definitionMember> <om:Phenomenon gml:id="taxon"> <gml:description>The taxon name</gml:description> <gml:name codeSpace="http://www.vliz.be">taxon</gml:name> </om:Phenomenon> </gml:definitionMember> </NDGPhenomenonDefinitions> <!--===================================================================--> <gml:FeatureCollection> <!-- ============================================================== --> <gml:featureMember> <NDGPointFeature gml:id="ICES_100"> <NDGPointDomain> <domainReference> <NDGPosition srsName="urn:EPSG:geographicCRS:4979" axisLabels="Lat Long" uomLabels="degree degree"> <location>55.25 6.5</location> </NDGPosition> </domainReference> </NDGPointDomain> <gml:rangeSet> <gml:DataBlock> <gml:rangeParameters> <gml:CompositeValue> <gml:valueComponents> <gml:measure uom="#tn"/> <gml:measure uom="#amount"/> <gml:measure uom="#gsm"/> </gml:valueComponents> </gml:CompositeValue> </gml:rangeParameters> <gml:tupleList> 'ANTHOZOA',63.1,missing 'Scoloplos armiger',66.1,missing 'Spio filicornis',10,missing 'Spiophanes bombyx',60.3,missing 'Capitellidae',131.8,missing 'Pholoe',10,missing 'Owenia fusiformis',23.4,missing 'Hypereteone lactea',6.8,missing 'Anaitides groenlandica',13.2,missing 'Anaitides mucosa',6.8,missing

“... there is a momentum from organisations such as IHO and WMO to adopt consistent approaches for the vocabulary of their data along the reference implementation of ISO Standards prescribed by the [Open Geospatial Consortium]...”

“The NDG format proved a robust recipient for the data from each community. It produced economical files with few redundant elements, striking about the right balance between weak and strong typing.”

Using CSML

Page 24: NERC DataGrid data model and its application

Spatiotemporal Databasese-Science Institute, Edinburgh

01-Nov-2005

24

Conclusions/future

Conclusions• Mechanism is lossy, in general

semantic integration is far more important than completeness of representation

• Emphasis on content, not container

• Mediator services can expose data model

• Well-known community formats – use efficient legacy APIs

• Initial semantic decoration can add context to entire workflow chain

• Loose relationship between legacy file data model and semantic (feature) instance to which it is mapped

Page 25: NERC DataGrid data model and its application

Spatiotemporal Databasese-Science Institute, Edinburgh

01-Nov-2005

25

Conclusions/future

Current and future work (NDG)• Implement tooling:

• CSML parsing/processing

• Automated ‘scanner’: {files} CSML

• Implement NDG data delivery (mediator) services layered over data model

Further perspectives• Integrate with broader interoperability frameworks (e.g. ‘semantics

repositories’ Feature Type Catalogues – WMO, IOC, INSPIRE)

• Generalise approach:

• meta-model for data modelling

• ‘data storage description language’ for file mappings (DFDL role?)

• canonicalised serialisation for workflows

Page 26: NERC DataGrid data model and its application

Spatiotemporal Databasese-Science Institute, Edinburgh

01-Nov-2005

26

Conclusions/future

Class1

Class2

-End1

1

-End2

*

«datatype»DataType1

conceptual model

GML app schema <gml:featureMember> <NDGPointFeature gml:id="ICES_100"> <NDGPointDomain> <domainReference> <NDGPosition srsName="urn:EPSG:geographicCRS:4979" axisLabels="Lat Long" uomLabels="degree degree"> <location>55.25 6.5</location> </NDGPosition> </domainReference> </NDGPointDomain> <gml:rangeSet> <gml:DataBlock> <gml:rangeParameters> <gml:CompositeValue> <gml:valueComponents> <gml:measure uom="#tn"/> <gml:measure uom="#amount"/> <gml:measure uom="#gsm"/> </gml:valueComponents> </gml:CompositeValue> </gml:rangeParameters> <gml:tupleList> 'ANTHOZOA',63.1,missing 'Scoloplos armiger',66.1,missing 'Spio filicornis',10,missing 'Spiophanes bombyx',60.3,missing 'Capitellidae',131.8,missing

GML dataset

auto-generateXSD

auto-generatedparser

Managing semantics

define datamodels

populatedataset instances

Page 27: NERC DataGrid data model and its application

Spatiotemporal Databasese-Science Institute, Edinburgh

01-Nov-2005

27

Conclusions/future

Stack of Builders (for UML meta-model)

• current class, object, attribute• specialised for particular

UMLXML mapping

Builder receives:• filtered SAX events• built object

Builder returns:• built object• new object class• new Builder (for inheritance

through substitutionGroups)

SAX2::DefaultHandler

+notationDecl()+unparsedEntity()

«interface»SAX2::DTDHandler

+setDocumentLocator()+startDocument()+endDocument()+startElement()+endElement()+characters()+processingInstruction()+ignorableWhitespace()+startPrefixMapping()+endPrefixMapping()+skippedEntity()

«interface»SAX2::ContentHandler

+error()+fatalError()+warning()

«interface»SAX2::ErrorHandler

+resolveEntity()

«interface»SAX2::EntityResolver

-process()

ObjectParser

+xmlElement() : BuilderUnion+xmlCharacters() : BuilderUnion+xmlAttribute() : BuilderUnion+processObject() : BuilderUnion

-currentClass-currentObject-currentAttribute

ObjectBuilderbuilderStack

*

ObjectPropertyBuilder ISO19118Builder

AbstractClassAbstractObjectAbstractBuilder

«union»BuilderUnion

Parser: