NERC DataGrid Status: ESP June 2004
Post on 12-Jan-2016
21 Views
Preview:
DESCRIPTION
Transcript
Bryan Lawrence on behalf of the NDG, BADC and BODC.
Ray Cramer, Marta Gutierrez, Kerstin Kleese, Siva Kondapalli, Sue Latham, Roy Lowry, Kevin O’Neill, Ag Stephens, Andrew Woolf
NERC DataGrid Status: ESP June 2004
NERC DataGrid Status: ESP June 2004
British Atmospheric Data Centre
http://badc.nerc.ac.uk
British Atmospheric Data Centrehttp://badc.nerc.ac.uk
Outline
• NDG Aims and Metadata Taxonomy (Review )
• Demonstration of NDG in action (no grid services yet, but shape of things to come
should be clear)– “Stub-B”– New Tool: DataExtractor
• Status– Issues with metadata
• Chemistry data at BADC• Numerical Simulation Discovery• & back to Status
British Atmospheric Data Centrehttp://badc.nerc.ac.uk
http://ndg.nerc.ac.uk
British Atmospheric Data Centre
British Oceanographic Data Centre
Simulations
Assimilation
Complexity + Volume + Remote Access = Grid Challenge
British Atmospheric Data Centrehttp://badc.nerc.ac.uk
NDG Metadata Taxonomy
British Atmospheric Data Centrehttp://badc.nerc.ac.uk
British Atmospheric Data Centrehttp://badc.nerc.ac.uk
NDG Metadata Architecture
Service based model:• clear separation between discovery and use• discovery service standards compliant and interoperable
British Atmospheric Data Centrehttp://badc.nerc.ac.uk
(D) - Discovery
OAI
OAI
Open Archives Initiative – Digital Library Protocol for harvesting metadata.
NDG Supports Multiple Discovery Services – “build your own”
DirectoryInterchange
Format
DublinCore
GEOProfile
(Z39.50)
IntermediateSchema
Document(s)(XML)
XSLTProcessor
XSLTProcessor
XSLTProcessor
ISO 19115?
CatalogueInteroperabiltiy
Protocol ?
NDG DiscoveryServiceElement
XSLT IngestTransformation
ExistingMetadata
Multiple Protocol Support will be built into the “NDG Vanilla Discovery Service”
British Atmospheric Data Centrehttp://badc.nerc.ac.uk
NDG Structure
XML
QQuery
Destination
NDGPortal
QueryType
One ormoreLocal
NDG DB
XML
D
Browse and redefine query
Discovery
QueryType
Data
Note that definitions A do not need tomatch any ingested A
Documents and Annotations
Detailed
User/SoftwareGenerates Query
XML
CDeliver one or moredocuments to user
XML
B
LocalNDG DBexists?
IngestA
Y
ExtractData
PhysicalData
Deliver Data
NDG Query and Data Delivery
Define DataRequest, Q
XML
A
British Atmospheric Data Centrehttp://badc.nerc.ac.uk
Wider InternetNERC Grid
taperobot
XML data-base
XML data-base
BADC NDG Wrapper
OnlineData
OnlineData
BODC NDGWrapper
OnlineData
XML data-base
Group NDGWrapper
Software Agent
Grid User
Satellite Supercomputer
Research Group DataSources
Internet Link
Internet User
Internet LinkESG (&other)Applications
Wider Internet
NDGWeb
Portal
XML data-base
British Atmospheric Data Centrehttp://badc.nerc.ac.uk
British Atmospheric Data Centrehttp://badc.nerc.ac.uk
Can order responses by title or data centre (or default random)Choose to return
either data or “B-”Metadata
Look at DIFs in either HTML or XML
Flexible Information Return
British Atmospheric Data Centrehttp://badc.nerc.ac.uk
British Atmospheric Data Centrehttp://badc.nerc.ac.uk
British Atmospheric Data Centrehttp://badc.nerc.ac.uk
British Atmospheric Data Centrehttp://badc.nerc.ac.uk
British Atmospheric Data Centrehttp://badc.nerc.ac.uk
British Atmospheric Data Centrehttp://badc.nerc.ac.uk
British Atmospheric Data Centrehttp://badc.nerc.ac.uk
British Atmospheric Data Centrehttp://badc.nerc.ac.uk
British Atmospheric Data Centrehttp://badc.nerc.ac.uk
Role of B metadata: domain ontology
• B metadata is a store of metadata intended to:– Allow the production of the various “industry
standard” discovery formats (DIF, DC, FGDC/GEO, 19115)
– Provide a more complete metadata store than that demanded by the usual discovery formats, leveraging the metadata holdings of the data centres
– Allow a smooth link across to the data browse and use elements of the NDG
• Expected to expand in importance as we can add more semantic detail to the schema
British Atmospheric Data Centrehttp://badc.nerc.ac.uk
B metadata – a simplified view
IncludesIncluded-in
Activity e.g.ProjectProgrammeExperiment
IncludesIncluded-in
IncludesIncluded-in
IncludesIncluded-in
Is-an
Observation stationStationary e.g.
Land stationMooring
Moving e.g.ShipAircraft
Common entitiesOrganisationPeopleRolesSpatio-temporal e.g.
AreaPlaceGridPointTrajectoryDateTime
Data EntitiesBasic Data Entities e.g.
Lagrangian pathEnsembleSampleProfileSectionn-dimensional
datasetDerived Data Entities e.g.
ClimatologyTimeseriesIntegration
Dataset typesSimulationAnalysisMeasurement
Data production toolsModels: e.g.
UMInstrument e.g.
CTDSondeLidar
IncludesIncluded-in
Deploys-aDeployed-on-a
ProducesOutput-by
British Atmospheric Data Centrehttp://badc.nerc.ac.uk
How is the B metadata implemented?
Core linking concept is the deployment
Deployment
Activity
on behalf of an Activity
of a Data Production Tool at an Observation Station
that produces a Data Entity
DataProduction
Tool
ObservationStation
Data Entity
Each of the main metadata objects has security data attached to it. This means that this can be applied to queries on the metadata
Links the metadata records into a structure that can be turned into a navigable XML using Xquery or XSLT with any of the record types as the root element.
British Atmospheric Data Centrehttp://badc.nerc.ac.uk
“Stub B” – what is it?
“B” metadata works well in databases, but what about:– presentation– “standalone generation of “D”– storing metadata locally as files
Given a raw B record for a Data Entity contains just:the basic data entity details a series of references to related records
no details such as:– activity name,– instrument name,– station
“stub B” is the base entity expanded through its own related deployments and internal references
British Atmospheric Data Centrehttp://badc.nerc.ac.uk
Role of Stub B
• Makes application developers’ lives easier, especially in the presentation of search results
• Allow off-line storage of metadata by users• Basis of D production via XSLT• Hook into main B repositories• Potential discovery format (while there are lots
around already… this could allow more “discipline dependent” discovery)
British Atmospheric Data Centrehttp://badc.nerc.ac.uk
Discovery Metadata Usage
Local Discoverymetadata store
(DIF)
Discoverymetadata (D)store (DIFs)
Query interface
OA
I Harvesting
Local “B”metadata store
Local Discoverymetadata store
(DIF)
Local “B”metadata store
Query
Results
Query local B metadatareceive “stub B” reply
Query local B metadatareceive “stub B” reply
Data Browseand use (“A”metadata)
Localdisc
Store queries/results to local storage
Investigate data in detail
DataCentre 1
DataCentre 2
Generate
Generate
British Atmospheric Data Centrehttp://badc.nerc.ac.uk
British Atmospheric Data Centrehttp://badc.nerc.ac.uk
British Atmospheric Data Centrehttp://badc.nerc.ac.uk
British Atmospheric Data Centrehttp://badc.nerc.ac.uk
British Atmospheric Data Centrehttp://badc.nerc.ac.uk
British Atmospheric Data Centrehttp://badc.nerc.ac.uk
Background activity being parallelised with GODIVA/CCLRC e-science collaboration (spectral -> gridpoint + CDMS + visualisation tools)
Download either plot or the data that went into the plot.
British Atmospheric Data Centrehttp://badc.nerc.ac.uk
Where are we?• Major effort on defining feature types for observation types so we can build an
OGC/ISO compatible data extractor for observations and numerical data.• Main thrust for Andrew Woolf and 0.5 New FTE• Ag Stephens contributing when time available
• Security Infrastructure Development– Collaboration with CCLRC e-science, ECOGrid and 0.5 FTE
• Ongoing work on metadata definition and population:– Oceanographic data
• Siva Kondapalli– Chemistry data
• Main thrust for Sue Latham– Numerical Modelling data
• DIF numerical definition (moving to ISO), BADC and UK Community• Katherine Bouton’s work at NCAS/CGAM
– Remote Sensing Data• Collaboration with NEODC and PML
• Ongoing work on databases and interfaces, DIF to ISO and “B”• Kevin O’Neill and Marta Gutierrez
British Atmospheric Data Centrehttp://badc.nerc.ac.uk
Authorisation• Role-based access:
<dataset> <host> badc.nerc.ac.uk </host><name>ukmo-obs </name><access-requires> researcher <access-requires><access-requires> ukmo-obs </access-requires><processing-requires> nerc </processing-requires></dataset>
• Key concept: Only hosts that trust each other share data, even within a larger virtual organisation: e.g. at BADC:
<trusted><bodc><host>ndg.bodc.nerc.ac.uk</host><attribute remotename=”nerc”> nerc </attribute><attribute remotename=”ashoe”> ashoe </attribute><attribute remotename=”staff”> nerc </attribute> <other> bodc </other></bodc></trusted>
Signed “conditions of use” form exists for this dataset
British Atmospheric Data Centrehttp://badc.nerc.ac.uk
NDG Security
Certificate based, pass encrypted credentials between user and gatekeeper.
British Atmospheric Data Centrehttp://badc.nerc.ac.uk
Extending the CF convention for chemistry …
grep -i sulphate vars2.csv"Allen, Andrew and Grenfell, Lee ", Sulphate / coarse (ug/m3)"Allen, Andrew and Grenfell, Lee ", Sulphate / fine (ug/m3)"Bradbury, Carl ", SULPHATE LOADING (ug/m3)"James, Jonathan And Allen, Andrew ", Sulphate / coarse (ug/m3)"James, Jonathan And Allen, Andrew ", Sulphate / fine (ug/m3)"James, Jonathan And Allen, Andrew ", Sulphate / fine+coarse (ug/m3)"James, Jonathan And Allen, Andrew ", Sulphate / fine+coarse (ug/m3)"McArdle, Nicola and Thompson, Adrian ", sulphate (nmol m-3)"McArdle, Nicola and Thompson, Adrian ", sulphate (µM)"McArdle, Nicola and Thompson, Adrian ", sulphate <1.1 µm diameter (nmol m-3)"McArdle, Nicola and Thompson, Adrian ", sulphate <1.2 µm diameter (nmol m-3)"McArdle, Nicola and Thompson, Adrian ", sulphate <1µm diameter (nmol m-3)"McArdle, Nicola and Thompson, Adrian ", sulphate > 1µm diameter (nmol-3)"McArdle, Nicola and Thompson, Adrian ", sulphate >1.1 µm diameter (nmol m-3)"McArdle, Nicola and Thompson, Adrian ", sulphate >1.2 µm diameter (nmol m-3)"McArdle, Nicola and Thompson, Adrian ", sulphate bulk (nmol m-3)"McArdle, Nicola and Thompson, Adrian ", sulphate bulk (nmol m-3)"McFadyen, Gordon ", Sulphate"Robertson, Leonie and Davison, Brian ", Coarse sulphate concentration (ug m-3)"Robertson, Leonie and Davison, Brian ", Fine sulphate concentration (ug m-3)
•Currently 35,000 Ames format files, mostly Atmospheric Chemistry …
• Real problems with vocabulary, and units …
• Spinning up a new project …
• … need community help!
grep -i butane vars2.csv… i-Butane (ppt)… iso-Butane (ppt)… n-Butane (ppt)… iso-Butane pptv … ISO-BUTANE (pptv)… i-,n-butane
ACSOE just one of many datasets with this problem …
British Atmospheric Data Centrehttp://badc.nerc.ac.uk
DRAFT DIF Component (1)
Key New Groups:
Numerical Model
ID Information
Numerical Model Components
(from) Atmosphere, Ocean-Dynamic, Ocean-Thermodynamic, Cryosphere, Land-Surface with possible appends: Chemistry, 4D-VAR, 3D-VAR, QG
… details for each
Numerical Simulation
ID Information
Initial Condition Information
… details
Forcing Information
… details
British Atmospheric Data Centrehttp://badc.nerc.ac.uk
DRAFT DIF Component (2)
*=required
+=repeatable
Group: Numerical_Model
Model_Name:
Model_Version:
* Model_Calendar: [model calendar valid] - eg CF calendar or ISO
*+ Group: Model_Component
* Model_Component_type: [Model Component Valid]
Model_Component_Resolution:
Group: Model_Component_VerticalDomain
VerticalDomain_Top:
VerticalDomain_Bottom:
End_Group
Model_Component_Timestep:
Group: Model_Component_Summary
[Multiple text lines allowed]
End_Group
End_Group
URL:
End_Group
Group: Numerical_Model
Model_Name:HadCM3
Model_Calendar: 360 day
Group: Model_Component
Model_Component_type: Atmosphere
Model_Component_Resolution: 2.5 degrees latitude, 3.75 degrees longitude
Group: Model_Component_VerticalDomain
VerticalDomain_Top: 1000 hPa
VerticalDomain_Bottom: 4 hPa
End_Group
Model_Component_Timestep: 0.5 hours
Group: Model_Component_Summary
Cullen et al Atmospheric Model adapted for climate use.
End_Group
End_Group
Example
Group: Model_Component
Model_Component_type: Ocean
Model_Component_Resolution: 1 degrees latitude, 1 degrees longitude
Group: Model_Component_VerticalDomain
VerticalDomain_Top: 0m
VerticalDomain_Bottom: 6000m
End_Group
Model_Component_Timestep: 0.5 hours
Group: Model_Component_Summary
Bryan and Cox Ocean model
End_Group
End_Group
URL: http://www.metoffice.com/hadcm3 (e.g.)
End_Group
British Atmospheric Data Centrehttp://badc.nerc.ac.uk
Draft DIF Components (3)*=required
+=repeatable
Group: Numerical_Simulation
Numerical_Simulation_Name:
* Numerical_Simulation_ID: [recorded using PURI]
* Group: run_period
* start_date: [yyyy-mm-dd-hh]
* end_date: [yyyy-mm-dd-hh]
* real_date: [yes,no]
End_Group
* Group Initial_Condition:
Ensemble: [Numeric Value]
Summary:
End_Group
*+ Group Forcing:
Ensemble: [Numeric Value]
Ensemble Parent : [uri or 0]
Summary:
End_Group
End_Group
Group: Numerical_Simulation
Numerical_Simulation_Name: All Forcings
Numerical_Simulation_ID: format_tbd_but_using_purl
Group: run_period
start_date: 1859-12-01-00
end_date: 1999-11-30-00
real_date: yes
End_Group
Group Initial_Condition:
Ensemble: 4
Ensemble Parent: 0
Summary: initial conditions taken from the HadCM3 control integration
End_Group
Group Forcing:
Summary: Volcanic forcing from Sato et al
End_Group
Group Forcing:
Summary: Solar Forcing from Lean et al
End_Group
Group Forcing:
Summary: CO2 from...
End_Group
Group Forcing:
Summary: Anthropogenic SO2 from...
End_Group
End_Group
ExampleGroup: Numerical_Simulation
Numerical_Simulation_Name: Ensemble Member of blah
Group: run_period
start_date: 1859-12-01-00
end_date: 1999-11-30-00
real_date: yes
End_Group
Group Initial_Condition:
Ensemble: 1
Ensemble Parent: format_tbd_but_using_purl
Summary: initial conditions taken from the HadCM3 control integration
End_Group
Group Forcing:
Summary: Volcanic forcing from Sato et al
End_Group
Group Forcing:
Summary: Solar Forcing from Lean et al
End_Group
Group Forcing:
Summary: CO2 from...
End_Group
Group Forcing:
Summary: Anthropogenic SO2 from...
End_Group
End_Group
Example Ensemble Member
British Atmospheric Data Centrehttp://badc.nerc.ac.uk
Where are we?• Major effort on defining feature types for observation types so we can build an
OGC/ISO compatible data extractor for observations and numerical data.• Main thrust for Andrew Woolf and 0.5 New FTE• Ag Stephens contributing when time available
• Security Infrastructure Development– Collaboration with CCLRC e-science, ECOGrid and 0.5 FTE
• Ongoing work on metadata definition and population:– Oceanographic data
• Siva Kondapalli– Chemistry data
• Main thrust for Sue Latham– Numerical Modelling data
• DIF numerical definition (moving to ISO), BADC and UK Community• Katherine Bouton’s work at NCAS/CGAM
– Remote Sensing Data• Collaboration with NEODC and PML
• Ongoing work on databases and interfaces, DIF to ISO and “B”• Kevin O’Neill and Marta Gutierrez
British Atmospheric Data Centrehttp://badc.nerc.ac.uk
(B) Metadata Model
Activity
IncludesIncluded-in
IncludesIncluded-in
IncludesIncluded-in
Can-be-aggregated-in
ProducesOutput-by
Derived data entities
Observation stationTypes
Basic data entitiesDataset types
Dataproduction
tools
IncludesIncluded-in
Deploys-aDeployed-on-a
ProducesOutput-at
ProducesOutput-by
Common Data Entities- dimensions, * spatial/temporal- grids- organisations- people- places/areas
Data Granules
British Atmospheric Data Centrehttp://badc.nerc.ac.uk
(B) Metadata Model Overview
Tier-0Activity
IncludesIncluded-in
IncludesIncluded-in
IncludesIncluded-in
Inter-tier relationship
Directed relationship
Can-be-aggregated-in
Integrated-into
Produces/Output-by
Can-be-aggregated-in
Tier-1 -Observationstation Types
Common entities
Tier-4 - Basic dataentities
Tier-3 -Datasettypes
Tier-2 - Dataproductiontools
IncludesIncluded-in
CollectsCan-be-collected-in
Integrated-into
Deploys-aDeployed-on-a
Is-time-ordered-series-of
Follows-a
Superset-ofSubset-of
Processed-to-a
Instrument
Ensemble
Analysis
Stationary Moving
SectionProfileLagrangianpath
Grid
Time Trajectory
Point
Area
Place
Model
Simulation
Spatiotemporalentity
Entity withDIF record
Dataowningentity
Sample
Can-be-aggregated-in
Person
Organisation
Role
Tier-5 - Deriveddata entities TimeseriesClimatology
Measurement
IntegrationCan-be-aggregated-in
N-dimensionaldataset
Can-be-aggregated-in
Integrated-into
Integrated-into
ProducesOutput-by
ProducesOutput-by
Spatialdimensions
GIS/ISO Feature Types
British Atmospheric Data Centrehttp://badc.nerc.ac.uk
Dataset
Variables
Multidimensionalarray
... of other arrays
... or fromaggregated
storage
Rich spatiotemporalreferencing (standards-compliant: ISO19108, ISO19111)
(A) NDG Semantic Data Model
top related