Click here to load reader
Click here to load reader
Jan 19, 2016
NERC Data GridHelen Snaith and the NDG consortium
What is NDG?NDG provides the infrastructure which allows users toFind dataExplore what is known about datasets (including information about the observing or simulating tools)Access, manipulate and visualise data!
NDG ServicesThe NDG Discovery ServiceA database of discovery informationA web-site, to provide a portal to that databaseA set of web-services, allowing NDG consumers to exploit that databaseThe NDG Vocabulary ServiceDatabases of environmental thesauri and ontology tools to map between termsSupports machine assisted browsing: eg, search for rainfall and find datasets with precipitation
What Can NERC Data Grid Do For You?Depends on who you are!Data User - eg scientist looking for dataData Provider - eg data centre, large research groupDeveloper - providing access tools or using data sources
As a Data User?Discovery leads directly to dataSimple search page or google-like toolbar gives direct access to data, not just the metadataComprehensive coverageSeamless accessHides data format differences and provides generic visualisation toolsEnhances interdisciplinary research
As a Data Provider?Provides a standards based metadata hierarchyExploit the NDG metadata structures to design a metadata system from the ground up, or use them as an interoperability toolNo need to run multiple security systemsIncreases visibility and usage of datasetsInclusion in NDG does not prevent inclusion in other data discovery or access initiativesMinimise user support loadNDG Security
As a Developer?Exploit NDG web servicesExploit NDG modular developmentExploit NDG metadata standardsNDG developments are non-proprietaryNDG standards compliance implies less future development
MetaData FrameworkDiscoveryISO standardMOLES - Metadata Objects for Links in Environmental ScienceDetailed metadata for browsingCSML Climate Science Markup LanguageFor accessing features within data
CSML in Action
What Cant the NDG Do?NDG cant generate the metadata!Automated manipulation and searching relies on comprehensive metadata, which require quality data management to create and maintainNDG does provide a clear framework for metadata requirementsNDG doesnt provide information services:- it provides data services!NDG does provide a framework within which it is possible to link data to derived information services:- research publications, reports, policy etc..
Competitors!How is NDG different from the rest? e2edm, End-to-End Data Management:Similar aims but using different technologiesNDG is more standards compliantThredds:Provides only catalogue servicesThredds is significantly more mature than NDGSuitable for data providers with hierarchical data / metadataOPeNDAP:A protocol for accessing dataLimited support for access controlRequires the user to know exactly what the data are
NDG StatusApproaching BetaDiscovery GatewayDiscovery ServiceCSML V2MOLESSecurity InfrastructureVocabulary ServiceMOLES browserPopulation (MOLES & CSML) by NDG partnersImplementing & testing gateways, browsers & interfacesndg.nerc.ac.ukTrac wiki site
Like the web, the NDG has no owner or central controldata remain with data providers - be theymanaged data centres in the UK or abroadsemi-managed data archives in large research groupsThe location of the data can be transparent to the userstill allowing data providers to maintain their intellectual investment by controlling access. Although NDG has no centre it does provide 2 community services which NDG participants can take advantage ofDiscovery data harvested from data providers around the world: currently includes BADC, BODC, PML, NOCS, Hamburg World Climate Data CentreDiscovery leads directly to data:A simple desktop search gives direct access to data, not just the metadataData can even be recovered directly into applications.Comprehensive coverage:Datasets from a wide range of sources from the UK, Europe and further afield.Even commercial datasets can be visible if not directly accessedSeamless access:Single-sign-on; no need to register with each data centre. NDG provides data access across multiple sources and even allows combination of data from multiple sources. This is achieved without compromising data security and protection of intellectual investment in data production Hides data format differences and provides generic visualisation tools:No need to develop new reading routines or learn to use new tools for each new data setExploit the python I/O library to NDG data and integrate into your own software!Enhances interdisciplinary research:Investigation of metadata can allow discovery of previously unrecognised linkagesProvides a standards based metadata hierarchyOnce you have a mapping from your information model to the NDG model, you can interoperate with NDG metadata browsing and discovery tools (and with other existing and emerging national and international regimes, such as INSPIRE)No need to run multiple security systemsYou can leverage off an existing access control system, or develop a new one, and plug into the NDG single-sign on authentication and authorisation frameworkLogging comes for free!Increases visibility and usage of datasetsInclusion in NDG does not prevent inclusion in other data discovery or access initiativesEven if your data centre or website is down, your data are discoverable through NDG discovery servicesMinimise user support loadMore users can find their data, and use them, without the need for direct support from data provider staffExploit NDG web servicesThe architecture of the NDG is based on independent interacting web servicesBuild your own, extend the existing services, and deploy them in different ways!Exploit NDG modular developmentThe underlying NDG data manipulation software is written in python modulesUse easy_install to exploit the NDG software and build or port your own data manipulation toolsAdd your favourite graphics package ... Exploit NDG metadata standardsDevelop new services confident that they will be deployable and interoperable across a wide range of data types and data providers.NDG developments are non-proprietaryNDG does not require compromising on interacting with other projectsNDG exploits OGC interfaces to give more ways of interacting with data.NDG standards compliance implies less future developmentYou will be able to leverage off NDG compliance to OGC and ISO protocols to deploy more toolsExploit the NDG identifier conventionsLodge your copies of your data with a NERC designated data centre, and support persistent long-lived data citation.MOLES provides a common and standardized way of describing key aspects of datasets:the Activities which generate data, the Observation Stations at which the data are collected (or produced in the case of simulations), the Data Production Tools, and the Data Entities themselves
The Climate Science Modelling Language, CSML, Schema. (an XML schema which is an Application Schema of the Geographic Markup Language). CSML provides format independent descriptions of the parameters and organization of datasets according to the Sampling Features and Observations and Measurements frameworks of the Open Geospatial Consortium (OGC).Examples of Features:
GridGrid seriesProfileProfile seriesThe distinction is important!E2edmNDG is based around mature OGC descriptions of environmental datae2edm is based around a number of emerging applicationsNDG and e2edm have similar aims but are using different technologiesArguably NDG is more standards compliant, and will provide support for a wider range of data sources. Threddsprovides catalogue services, primarily limited to file-based accessWithin what it does, Thredds is significantly more mature than NDGVery suitable for data providers with limited metadata and/or dataOPeNDAPa protocol for accessing datalimited support for access controlrequires the data user to know exactly what the data are before they can be usedMuch of the development work is complete we have the working schema for MOLES & CSMLDiscovery harvesting is now routineNeed to link existing metadata & services to vocab servers to ensure consistent vocab and allow simple ontology usageSecurity used for MOLES browse and data extraction (CSML GUI) implemented at DPs with custom code to interface to existing authentication methodsMOLES & extraction ready to be deployed at other partner sites.Population!