Click here to load reader
Click here to load reader
Jan 19, 2016
Bryan Lawrence, BADCDavid BoydKerstin KleeseRoy LowryDean WilliamsBob DrachMike Fiorino
Deputy Director CLRC e-Science centreDL: Climate Database ExpertBODC: Marine Database ExpertPCMDI: ESG Principle InvestigatorPCMDI: ESG Metadata ArchitecturePCMDI: MeteorologistAcronym Summary:PCMDI: Program for Climate Model Data Intercomparison (US Department of Energy, Lawrence-Livermore National Lab)ESG: Earth System Grid (US Grid Project: NCAR, Argonne, PCMDI, USC )
OutlineMotivationThe Earth System Griddefinitions of portals and applicationsontologiesRelations with other NERC e-science programmes.Architecturequeryingsoftware StackInitial steps and Project ManagementConnectivity with other grid projectsSuccess and FailureSummary of what we are doing and the road to the future
The BADC part of NCAS!The Role: Key words: Curation and Facilitation!http://www.badc.rl.ac.uk
Just under half of BADC users are NOT atmospheric scientists:
Registered Non-Atmospheric Science Users
FTP/Browse Archive problem001
Other computer problem93014
Other data problem1066
Terrestrial and Freshwater283328
Terrestrial and Freshwater283328
Number of Queries per Semester
User Queries by Discipline
Motivation Town meeting 2001E-science should be involved with:delivering an enhanced meta-data record of archived data.'dictionary' building.building systems to translate data and link databases.integrating computer and natural science communities.the ability to generate a single query across multiple datasets (in different catalogues) returning both metadata and data.the ability to acquire large datasets in near real time (NRT).the automatic production of metadata, both by models, and where possible, by observing systems.
Summary from two of the four working groups!
Relevant to many stakeholders(Slide from Julia Slingos introduction to CGAM as part of NCAS)
MotivationPage 22:NERC will ... ensure that Earth system science is underpinned by e-science investments to enable access, manipulation of data from diverse sources.
The Data Use Chain
NERC Metadata Gateway - SST Geospatial coordinates forgotten. Time reference forgotten. Need to get entire field(s), and find correct time!And if I want to compare data from different locations?- multiple logins- multiple formats- discovery?
Searching: need comprehensive metadata!A priori would any user know to look in the COAPEC data set? Earth system-science means we have to remove these boundaries! detailed file level metadata isnt visible, and so data mining applications impossible.- need ontologies to help queries match actual data descriptions.NB: Dynamic catalogues!
What is an Ontology?An ontology defines the terms used to describe and represent an area of knowledge by specifying the following kinds of concepts:Classes (general things) in the many domains of interest The relationships that can exist among things The properties (or attributes) those things may have
Ontologies are usually expressed in a logic-based language, so that detailed, accurate, consistent, sound, and meaningful distinctions can be made among the classes, properties, and relations..
Ontology Example:An example of part of ontology defined using OIL (e.g. see Oil in a Nutshell, D. Fensel et.al.) ontology-definitions slot-def eats inverse is-eaten-by slot-def has-part inverse is-part-of properties transitive class-def defined carnivore subclass-of animal slot-constraint eats value-type animal class-def defined herbivore subclass-of animal slot-constraint eats value-type plant OR (slot-constraint is-part-of has-value plant) With current funding, the NDG does not aim to build a formal ontology, but we do aim to being to build a thesaurus that can form the basis of one, and we do hope to spin off a project to build one and integrate it in the NDGclass-def animalclass-def plant subclass-of NOT animal class-def tree subclass-of plant class-def branch slot-constraint is-part-of has-value tree class-def leaf slot-constraint is-part-of has-value branch class-def
class-def giraffe subclass-of animal slot-constraint eats value-type leaf class-def lion subclass-of animal slot-constraint eats value-type herbivore
RelationshipsClassesProperties(OIL: Ontology Inference Layer)
ESG: Example of a Web-based Data Portal ESG will provide support for: large but simple data sets, limited metadata, but not searchable. NDG will provide support forSmall-but-complex datasets.Data-mining (searchable metadata).NDG is complementary to ESG!
Live Access Server (1) we will keep the basic structure, but gradually replace components.
Live Access Server (2)Data Request Structure:
ESG: Example of a Client ApplicationWe will: Provide python based classes for our observational data to complement the access to 3D gridded data. Provide a web services wrapper so that other grid applications can access NDG data.
Applications and Portals
Relationship to GODIVA (Haines et.al.)(Grid for Ocean Diagnostics, Interactive Visualisation and Analysis)Architecture of the GODIVA Grid: NDG will: improve data discovery tools for GODIVA (even for their own datasets). provide metadata creation tools for GODIVA participants. provide access to data held outside GODIVA participants.
ClimatePrediction.comCP.COM will need the NDG to make best use of observational data in evaluating their parameter space.
Mining on the GridFrom Hinkes NASA IPG presentation at CEOS, Rome, May 2002
Data mining: Grid Miner ArchitectureFrom Hinkes NASA IPG presentation at CEOS, Rome, May 2002The devil is in the detail: how does the data mining agent get at the data? Need data mining clients objects which can read specific datatypes and present themselves to agents!
Finding data: Querying!Requires databases of metadata & querying those databases.Each part of the NDG will have an internal metadata catalogue (&/or database), and data (either in flat files or the database).so the querying strategy must support centralised querying on partially indexed data, followed (if necessary) by distributed querying, which may or may not need mapping into a local database schema. In the grid environment the indexes themselves will be replicated, and some data may also be replicated.Major NDG design issue: developing appropriate data models, database schema and indexing strategies!This is not a generic problem, it will be specific to our datatypes.Technology needs to be public domain (i.e. free) for uptake!NDG approach to database technology will be developed in conjunction with DBTF.
Query Pathway; software components
Information StructurePCMDI ComponentsNDG ComponentsJoint InterfacesExisting Components
Simplified Software StackKey point:make use of existing technology, allow component replacement with time!Achievable by:interface definition and integration.Note: Any application will be able to access our data services via the OGSA wrapper in the middleware.
NDG: Ingestion Tasks
Draft Project SchedulePhase One Delivery
Replace with GlobusGiggle?Next steps include:Replacing the transport layers in the metadata gateway with SOAPReplacing the SGML in the metadata gateway with XMLetc
Indicators of SuccessFinding and making use of data:Possible to find, reformat, and visualise disparate datasets from disparate organisations within one application.No longer necessary to rely on personal contacts to locate and acquire data of interest if its held in the BADC/BODC.Key requirement for interdisciplinarity; the ability to test data comparison ideas without learning foreign formats and establishing personal relationships every time. Other NERC data designated data centres implementing NDG.Take up by community:NDG software (but not neces