Spatiotemporal Databases e-Science Institute, Edinburgh 01-Nov-2005 1 Andrew Woolf 1 ([email protected]), Ray Cramer 2 , Marta Gutierrez 3 , Kerstin Kleese van Dam 1 , Siva Kondapalli 2 , Susan Latham 3 , Bryan Lawrence 3 , Roy Lowry 2 , Kevin O’Neill 1 , Ag Stephens 3 1 CCLRC e-Science Centre 2 British Oceanographic Data Centre 3 British Atmospheric Data Centre NERC DataGrid data model and its application
NERC DataGrid data model and its application. Andrew Woolf 1 ( [email protected] ), Ray Cramer 2 , Marta Gutierrez 3 , Kerstin Kleese van Dam 1 , Siva Kondapalli 2 , Susan Latham 3 , Bryan Lawrence 3 , Roy Lowry 2 , Kevin O’Neill 1 , Ag Stephens 3 1 CCLRC e-Science Centre - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Spatiotemporal Databasese-Science Institute, Edinburgh
01-Nov-2005
1
Andrew Woolf1 ([email protected]), Ray Cramer2, Marta Gutierrez3, Kerstin Kleese van Dam1, Siva
Kondapalli2, Susan Latham3, Bryan Lawrence3, Roy Lowry2, Kevin O’Neill1, Ag Stephens3
1 CCLRC e-Science Centre2 British Oceanographic Data Centre
3 British Atmospheric Data Centre
NERC DataGrid data model and its application
Spatiotemporal Databasese-Science Institute, Edinburgh
01-Nov-2005
2
Outline
NERC DataGrid – data integration problem
Semantics as integration key
CSML
Wrapper/mediator architecture
Use and future
Spatiotemporal Databasese-Science Institute, Edinburgh
01-Nov-2005
3
NERC DataGrid
British Atmospheric Data Centre
British Oceanographic Data Centre
Simulations
Assimilation
Spatiotemporal Databasese-Science Institute, Edinburgh
01-Nov-2005
4
NDG data integration
Most (but not all) NDG data is file-based…On the Grid, no-one should know if you’re a file or
relational table… (one service to bind them all)
The file problem• multiple formats• focus usually on container, not content
Scientific file format examples (earth sciences):• netCDF• HDF4• HDF5• GRIB• NASA Ames• ...
Spatiotemporal Databasese-Science Institute, Edinburgh
01-Nov-2005
5
NDG data integration
Spatiotemporal Databasese-Science Institute, Edinburgh
01-Nov-2005
6
NDG data integration
Typically, API is fundamental point of reference• binary format details not always exposed (or guaranteed)• public API often the only supported access mechanism• API typically implemented as optimised native library• why reinvent a well-known working interface?
Data Format Description Language (DFDL)• XML ‘facade’ to file formats• earth science files often giga-scale XML query interface not likely
to be efficient• encapsulating format not the issue for NDG...• ...integrating domain-specific semantics efficiently across files and
formats is!
Spatiotemporal Databasese-Science Institute, Edinburgh
01-Nov-2005
7
NDG data integration
Information and file contents• same information in different file formats – want to expose
information, not format (seen earlier)
• in addition, semantic information structures may be composed across files
Spatiotemporal Databasese-Science Institute, Edinburgh
01-Nov-2005
8
Integration – semantics
Want semantic access to information, not abstract data• getData(potential temperature from ERA-40 dataset
in North Atlantic from 1990 to 2000)• not: getData(“era40.nc”, ‘PTMP’, 20:50, 300:340,
190:200)• or even worse:
for j=1990:2000
getData(“era40_”+j+“.nc”, ‘PTMP’, 20:50, 300:340)
Lossy is OK!• Care less about completeness of representation than semantic
unification
Spatiotemporal Databasese-Science Institute, Edinburgh
01-Nov-2005
9
NDG data integration
Integration approaches: warehousing
NDG
BADCBODC
form
at co
nver
sion form
at conversion
BADCBODC
wrapper wrapper
mediatorServicedataInterface
mediatorServicedataInterface
Integration approaches: wrapper/mediator
Spatiotemporal Databasese-Science Institute, Edinburgh
01-Nov-2005
10
Integration – semantics
Summary:
What we require is• semantic access to information (within and across files);
• and to use native (well-known) efficient APIs under the covers
also:• scalability across providers
• warehousing not an option (tera-scale!)
• enhance access and use, ‘outwards-facing’ (e.g. impacts community, policymakers)
• storage heterogeneity
Spatiotemporal Databasese-Science Institute, Edinburgh
01-Nov-2005
11
Integration – semantics
Database data modelling• Relational model (Codd, 1970)
• Entity-relationship model (Chen, 1976)
• Semantic data models
• Object-oriented data models (inheritance, aggregation, behaviour)
File-based data modelling• Far less advanced
• Abstract models (‘variables’, ‘arrays’, etc.; no ‘object’ file formats in widespread use for earth science data)
• API-driven
Spatiotemporal Databasese-Science Institute, Edinburgh
01-Nov-2005
12
Integration – semantics
Fundamentally, an information community is defined by shared semantics• semantics often (but not always) implicit• use information semantics for data integration
Semantics as integration ‘key’• common language across providers (and users)• supports wrapper/mediator architecture
NDG Solution components:• semantic data model (Climate Science Modelling Language)• storage descriptor (wrapper)• data services (mediator)
Spatiotemporal Databasese-Science Institute, Edinburgh
01-Nov-2005
13
CSML
Geographic ‘features’• “abstraction of real world
phenomena” [ISO 19101]• Object models for data types – type
or instance• Encapsulate important semantics in
universe of discourse
Application schema• Defines semantic content and
logical structure of datasets• ISO standards provide conceptual
[from ISO 19109 “Geographic information – Rules for Application Schema”]
Spatiotemporal Databasese-Science Institute, Edinburgh
01-Nov-2005
14
CSML
CSML aims:• provide semantic integration mechanism for NDG data• explore new standards-based interoperability framework• emphasise content, not container
Design principles:• offload semantics onto parameter type (‘phenomenon’, observable,
measurand)• e.g. wind-profiler, balloon temperature sounding
• offload semantics onto CRS• e.g. scanning radar, sounding radar
• ‘sensible plotting’ as discriminant• ‘in-principle’ unsupervised portrayal
• explicitly aim for small number of weakly-typed features (in accordance with governance principle and NDG remit)
Spatiotemporal Databasese-Science Institute, Edinburgh
01-Nov-2005
15
CSML
Semantic data model• Climate Science Modelling Language (CSML),
http://ndg.nerc.ac.uk/csml• Weakly-typed conceptual models for range of information types• Independent of storage concerns• Based on ISO ‘geographic feature types’ framework• Defined on basis of geometric and topologic structure
CSML feature type Description Examples
TrajectoryFeature Discrete path in time and space of a platform or instrument.
ship’s cruise track, aircraft’s flight path
PointFeature Single point measurement. raingauge measurement
ProfileFeature Single ‘profile’ of some parameter along a directed line in space.
wind sounding, XBT, CTD, radiosonde
GridFeature Single time-snapshot of a gridded field. gridded analysis fieldPointSeriesFeature Series of single datum measurements. tidegauge, rainfall timeseries
ProfileSeriesFeature Series of profile-type measurements.
vertical or scanning radar, shipborne ADCP, thermistor chain timeseries
GridSeriesFeature Timeseries of gridded parameter fields.numerical weather prediction model, ocean general circulation model
Spatiotemporal Databasese-Science Institute, Edinburgh
01-Nov-2005
16
CSML
CSML feature type examples:ProfileSeriesFeature
ProfileFeature
GridFeature
Spatiotemporal Databasese-Science Institute, Edinburgh
Spatiotemporal Databasese-Science Institute, Edinburgh
01-Nov-2005
20
Mediator
Data services (mediator)• Data services expose semantic model:
• Mappings to third-party data models (e.g. file formats, OPeNDAP)
• Canonical serialisation (e.g. ISO 19118 UML XML mapping) – Geography Markup Language
• Example services:
• netCDF file instantiation
• OPeNDAP delivery
• Open Geospatial Consortium (OGC) web services, e.g. Web Feature Service, Web Coverage Service
• Pushed down to the file level, data access request should use optimised native file format-specific I/O
Spatiotemporal Databasese-Science Institute, Edinburgh
01-Nov-2005
21
Mediator
Provides semantic abstraction layer
instantiateNetCDF(DatasetID, FeatureID)
<CSML>+writeNetCDF()
CSMLAbstractFeature(SAX) demarshalling
+read()
AbstractFileExtract
<CSML>
filestore
<CSML>
NetCDF
WCS
WFS
OPeNDAP
....
<CSML>
Spatiotemporal Databasese-Science Institute, Edinburgh
01-Nov-2005
22
Using CSMLExample of CSML use – MarineXML
SeeMyDENC
Data Dictionary
S52 Portrayal Library
SENC
MarineGML
(NDG) Feature
Types
XM
L P
arser
XML
XML
XML
XML
Biological Species
Chl-a from Satellite
MeasuredHydrodynamics
ModelledHydrodynamics
XSLT
XSLT
XSLT
For each XSD (for the source data) there is an
XSLT to translate the data to the Feature
Types (FT) defined by CSML. The FT’s and
XSLT are maintained in a ‘MarineXML registry’ The FTs can then
be translated to equivalent FTs for
display in the ECDIS system
XSLT
Features in the source XSD must be present in
the data dictionary.
XSD
XSD
XSD
XSD
XML
XML
The result of the translation is an encoding that contains the
marine data in weakly typed (i.e. generic) Features
XSLT
XSLT
Phenomena in the XSD must have an associated
portrayal
ECDIS acts as an example client for
the data.
Data from different parts of the marine
community conforming to a variety of schema
(XSD)
with thanks to Keiran Millard, HR Wallingford
Spatiotemporal Databasese-Science Institute, Edinburgh
01-Nov-2005
23
“MarineXML is an initiative of the IOC/IODE of UNESCO to improve marine data exchange within the marine community. The European Commission has provided a funding contribution to this initiative as part of its 5th Framework Programme to undertake a ‘pre-standardisation’ task of identifying the approaches the marine community should adopt regarding XML technology to achieve improved data exchange.”
“... there is a momentum from organisations such as IHO and WMO to adopt consistent approaches for the vocabulary of their data along the reference implementation of ISO Standards prescribed by the [Open Geospatial Consortium]...”
“The NDG format proved a robust recipient for the data from each community. It produced economical files with few redundant elements, striking about the right balance between weak and strong typing.”
Using CSML
Spatiotemporal Databasese-Science Institute, Edinburgh
01-Nov-2005
24
Conclusions/future
Conclusions• Mechanism is lossy, in general
semantic integration is far more important than completeness of representation
• Emphasis on content, not container
• Mediator services can expose data model
• Well-known community formats – use efficient legacy APIs
• Initial semantic decoration can add context to entire workflow chain
• Loose relationship between legacy file data model and semantic (feature) instance to which it is mapped
Spatiotemporal Databasese-Science Institute, Edinburgh
01-Nov-2005
25
Conclusions/future
Current and future work (NDG)• Implement tooling:
• CSML parsing/processing
• Automated ‘scanner’: {files} CSML
• Implement NDG data delivery (mediator) services layered over data model
Further perspectives• Integrate with broader interoperability frameworks (e.g. ‘semantics
repositories’ Feature Type Catalogues – WMO, IOC, INSPIRE)
• Generalise approach:
• meta-model for data modelling
• ‘data storage description language’ for file mappings (DFDL role?)
• canonicalised serialisation for workflows
Spatiotemporal Databasese-Science Institute, Edinburgh