AUKEGGS Workshop ANU, Canberra, 29 November 2006 Implementing CSML Feature Types in applications within the NERC DataGrid Dominic Lowe, British Atmospheric Data Centre, CCLRC and NDG team (Andrew Woolf, Bryan Lawrence et al.)
AUKEGGS WorkshopANU, Canberra, 29 November 2006
Implementing CSML Feature Types in applications within the NERC DataGrid
Dominic Lowe, British Atmospheric Data Centre, CCLRC
and NDG team (Andrew Woolf, Bryan Lawrence et al.)
AUKEGGS WorkshopANU, Canberra, 29 November 2006
http://ndg.nerc.ac.uk
Complexity + Volume + Remote Access = Grid Challenge
British Atmospheric Data Centre
British Oceanographic Data Centre
NCAR
NERC DataGrid
AUKEGGS WorkshopANU, Canberra, 29 November 2006
CSML Feature Model
• Intermediary XML data format
• Represents Atmospheric, Oceanographic data
• GML Application schema
• Features based on geometery e.g
• GridSeriesFeature
• PointFeature
• TrajectoryFeature
• ...
AUKEGGS WorkshopANU, Canberra, 29 November 2006
AUKEGGS WorkshopANU, Canberra, 29 November 2006
CSML Parser
XMLElements
<GridFeature>...
</GridFeature>
PythonClasses
class GridFeature:...
Provides mappings (and conversion) between XML model and python object model.
toXML()
fromXML()
Implemented in Python, uses c module: cElementTree Performance:
Length of XML (lines) Time to parse (CPU secs)87 0.010152 0.010369 0.04011,482 0.570
AUKEGGS WorkshopANU, Canberra, 29 November 2006
CSML Parser
Hierarchical - Calling fromXML() or toXML() method of root element calls methods of child elements.
<Dataset> <FeatureCollection> <GridSeriesFeature>
<rangeSet>...</rangeset><domain>...</domain><...></...>
</GridSeriesFeature>and so on ...
</FeatureCollection><Dataset>
Dataset.toXML()FeatureCollection.toXML()GridSeriesFeature(AbstractFeature).toXML()RangeSet.toXML()..... and so on
AUKEGGS WorkshopANU, Canberra, 29 November 2006
CSML Parser Process whole document or parts of the document
To convert whole document:
From XML to python:tree = ElementTree(file='mycsmlfile.xml')dataset = csml.parser.Dataset()dataset.fromXML(tree.getroot())
From Python to XML:csmldoc = dataset.toXML()
Or just convert a fragment: gsFeature = GridSeriesFeature() gsFeature.fromXML(xmlFragment) gsXML = gsFeature.toXML()
Allows for easy editing of documents Or addition of new features Or redefining existing features e.g. subsetting
AUKEGGS WorkshopANU, Canberra, 29 November 2006
CSML Parser – Creating CSML Can be used to create CSML documents (or fragments) from scratch:
gs = GridSeriesFeature(id = 'myGS' domain = mydomain, rangeSet = myRS)
ps = PointFeature(id = 'mypoint', domain = mydomain2, rangeSet= myrs2)
fc = FeatureCollection(members = [gs,ps, ...]
ds = Dataset(id ='mycsmldocument', featureCollection = fc)
ds.toXML() mycsmldocument.xml
No need for data providers to understand XML APIs
Very useful for performing operations on features
AUKEGGS WorkshopANU, Canberra, 29 November 2006
CSML Parser
2 level API to parser object model:
Parser level (mainly used by me + NDG data providers):
dataset.featureCollection.members[3].profileSeriesDomain
Wrapped by higher level interface (used by client applications)
getListOfFeatures(csmldoc) # get list of available featuresgetDomain(feature) # get domain infogetAffordances(feature) # get available operationssubsetFeature(feature, subset) # operation:request subset
AUKEGGS WorkshopANU, Canberra, 29 November 2006
Access to underlying data Multiple I/O libraries - cdms, NAppy, others... CSML code talks to a single DataInterface class that provides a uniform wrapper for different file access methods.
Easy to add more data formats - just need to write the correct wrapper methods (getData, getSubsetOfData, getVariable ...)
Similar interface needed for RDBMS access (not yet implemented)
AUKEGGS WorkshopANU, Canberra, 29 November 2006
AUKEGGS WorkshopANU, Canberra, 29 November 2006
CSML tooling - Use Case #1Scanning at BADC
Multiple data formats, NetCDF, NASAAmes, GRIB, PP
Feature identification challenges
Scanner has concept of a FeatureFileMap + Config options
Creates parser objects, then calls csml.parser.dataset.toXML() to create document.
By using parser, does not have to worry about XML details.
AUKEGGS WorkshopANU, Canberra, 29 November 2006
CSML tooling - Use Case #2Scanning at BODC
Metadata in Oracle Database.
Python-Oracle link to extract metadata
Create parser objects, then call toXML()
By using parser, does not have to worry about XML details.
AUKEGGS WorkshopANU, Canberra, 29 November 2006
CSML tooling - Use Case #3Subsetting operation
Query CSML document, getFeatureList() etc.
Subset CSML dataset, return CSML document + new netcdf file
Subset multiple datasets return CSML document describing both + netcdf files
Subset datasets from different data providers and supply in single CSML file + netcdf files
All simplified by use of parser 'objects'.
AUKEGGS WorkshopANU, Canberra, 29 November 2006
CSML tooling - Use Case #4CSML Updates
Datasets change!
BADC has automatic ingest scripts.
Datasets change often, and without warning!
Feasible to automate metadata updates:
Using parser to update existing CSML document when dataset changes Or rescanning dataset periodically and write new CSML document
Not implemented btw...
AUKEGGS WorkshopANU, Canberra, 29 November 2006
CSML tooling - Use Case #5NDG power user
Writes bespoke python scripts to access data
e.g. For a dataset that is updated daily: Uses the CSML API to download a subset every day eg. Temperature at certain locations.
AUKEGGS WorkshopANU, Canberra, 29 November 2006
CSML tooling - Use Case #6Integrate CSML into Applications
High Level API easy to use
Integrated with: BADC DataExtractor TPAC WCS Meteorologisk institutt (Norway)
AUKEGGS WorkshopANU, Canberra, 29 November 2006
BADC Data Extractor
AUKEGGS WorkshopANU, Canberra, 29 November 2006
TPAC WCS
AUKEGGS WorkshopANU, Canberra, 29 November 2006
Norwegian Met Office
AUKEGGS WorkshopANU, Canberra, 29 November 2006
Summary
• Modular set of tools
• “Features as Objects” instead of just XML
• Many use cases simplified by object model
• High level API – easy integration of features with applications
• CSML v2 parser under development; more sophisticated than v1 parser & may be adaptable for other domains.