NERC DataGrid NERC DataGrid NERC DataGrid NERC DataGrid Vocabulary Server Vocabulary Server Use Cases Use Cases Vocabulary Workshop, RAL, February Vocabulary Workshop, RAL, February 25, 2009 25, 2009
Jan 23, 2016
NERC DataGridNERC DataGrid
NERC DataGridNERC DataGridVocabulary Server Vocabulary Server Use CasesUse Cases
Vocabulary Workshop, RAL, February 25, 2009Vocabulary Workshop, RAL, February 25, 2009
NERC DataGridNERC DataGrid
Use CasesUse Cases
Metadata population with verifiable content
Dynamic drop-down lists
Semantic cross-walk
Smart discovery
Vocabulary Server usage models
NERC DataGridNERC DataGrid
Metadata Population Use CaseMetadata Population Use Case
SeaDataNet is an EU project building a distributed data system across 30-40 European and Mediterranean data centres
Semantic infrastructure provided by NDG Vocabulary Server
SeaSearch was a precursor project federating metadata across a slightly smaller network
SeaSearch was plagued by local vocabulary maintenance allowing illegal values into documents
SeaDataNet adopted two strategies to address this
NERC DataGridNERC DataGrid
Metadata Population Use CaseMetadata Population Use Case
Strategy 1: constraint through tooling
Provide a metadata editor that Allows manual entry of XML metadata records Exports a simple RDBMS schema into XML
Link this up to the Vocabulary Server API to Populate drop-down lists Verify fields populated from vocabularies as they are
output
SeaDataNet Mikado tool does this
NERC DataGridNERC DataGrid
Metadata Population Use CaseMetadata Population Use Case
Strategy 2: constraint through validation
Problem is not everybody uses the tools SeaDataNet metadata documents include
Schematron code to validate field content Schematron maintained by software polling
Vocabulary Server API Records validated at source using Schematron-aware
tool (e.g. Oxygen 8 or later) or on-line validation service
NERC DataGridNERC DataGrid
Dynamic Drop-Down List Use CaseDynamic Drop-Down List Use Case
SeaDataNet marks up data using BODC Parameter Usage Vocabulary (21000 terms)
Navigation of something this size is a potential issue
Addressed by building three layers of increasingly broad terms over the top
Layers linked together using SKOS mappings
NERC DataGridNERC DataGrid
Dynamic Drop-Down List Use CaseDynamic Drop-Down List Use Case
Search client required to exploit this
An obvious design for this is a series of drop-down lists working down the hierarchy
These need to be dynamically populated to keep up to date with the master vocabulary versions
NERC DataGridNERC DataGrid
Dynamic Drop-Down List Use CaseDynamic Drop-Down List Use Case
The following URL gives all terms from the top level hierarchy:
http://vocab.ndg.nerc.ac.uk/list/P081/current
This may be used to set up a list of hot-linked labels pointing to Vocabulary Server concept URLs such as:
http://vocab.ndg.nerc.ac.uk/term/P081/3/DS02
Represents the concept ‘chemical oceanography’
When selected by the user a Vocabulary Server call is issues and…..
NERC DataGridNERC DataGrid
Dynamic Drop-Down List Use CaseDynamic Drop-Down List Use Case
…we get a SKOS document thus
<?xml version="1.0" ?>
- <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:skos="http://www.w3.org/2004/02/skos/core#" xmlns:dc="http://purl.org/dc/elements/1.1/">
- <skos:Concept rdf:about="http://vocab.ndg.nerc.ac.uk/term/P081/3/DS02">
<skos:externalID>SDN:P081:3:DS02</skos:externalID>
<skos:prefLabel>Chemical oceanography</skos:prefLabel>
<skos:altLabel>Chemical oceanography</skos:altLabel>
<skos:definition>The chemical oceanographic science domain</skos:definition>
<dc:date>2009-02-10T10:30:20.052+0000</dc:date>
<skos:narrowMatch rdf:resource="http://vocab.ndg.nerc.ac.uk/term/P031/12/B007" />
<skos:narrowMatch rdf:resource="http://vocab.ndg.nerc.ac.uk/term/P031/12/C003" />
<skos:narrowMatch rdf:resource="http://vocab.ndg.nerc.ac.uk/term/P031/12/C005" />
<skos:narrowMatch rdf:resource="http://vocab.ndg.nerc.ac.uk/term/P031/12/C010" />
<skos:narrowMatch rdf:resource="http://vocab.ndg.nerc.ac.uk/term/P031/12/C015" />
<skos:narrowMatch rdf:resource="http://vocab.ndg.nerc.ac.uk/term/P031/12/C017" />
<skos:narrowMatch rdf:resource="http://vocab.ndg.nerc.ac.uk/term/P031/12/C020" />
<skos:narrowMatch rdf:resource="http://vocab.ndg.nerc.ac.uk/term/P031/12/C025" />
<skos:narrowMatch rdf:resource="http://vocab.ndg.nerc.ac.uk/term/P031/12/C030" />
<skos:narrowMatch rdf:resource="http://vocab.ndg.nerc.ac.uk/term/P031/12/C035" />
<skos:narrowMatch rdf:resource="http://vocab.ndg.nerc.ac.uk/term/P031/12/C040" />
<skos:narrowMatch rdf:resource="http://vocab.ndg.nerc.ac.uk/term/P031/12/C045" />
<skos:narrowMatch rdf:resource="http://vocab.ndg.nerc.ac.uk/term/P031/12/C050" />
<skos:narrowMatch rdf:resource="http://vocab.ndg.nerc.ac.uk/term/P031/12/C055" />
</skos:Concept>
</rdf:RDF>
NERC DataGridNERC DataGrid
Dynamic Drop-Down List Use CaseDynamic Drop-Down List Use Case
This delivers a set of URIs from the next level down in the hierarchy
Again, these may be displayed as hot-linked labels and again the user selects one to drill down into the next layer of the hierarchy through another VS call
Maris BV in the Netherlands have linked this to Ajax to produce a client
NERC DataGridNERC DataGrid
Semantic Crosswalk Use CaseSemantic Crosswalk Use Case
BODC wishes to produce a GCMD DIF document from an EDMED V1.2 document
The “parameter” sections of the two documents are populated using different vocabularies (BODC PDV and GCMD Science Keywords)
This situation was usually addressed by having no parameter section in the output document. We can now do better…
NERC DataGridNERC DataGrid
Semantic Crosswalk Use CaseSemantic Crosswalk Use Case
A list of BODC PDV terms as parameter URNs is obtained from the EDMED document, for example:
SDN:P021:24:TEMPSDN:P021:24:PSALSDN:P021:24:CPWC
This may then translated into a list of URLs
http://vocab.ndg.nerc.ac.uk/term/24/TEMPhttp://vocab.ndg.nerc.ac.uk/term/24/PSALhttp://vocab.ndg.nerc.ac.uk/term/24/CPWC
NERC DataGridNERC DataGrid
Semantic Crosswalk Use CaseSemantic Crosswalk Use Case
This list may be rolled into an HTTP get request thus:
http://vocab.ndg.nerc.ac.uk/axis2/services/vocab/getRelatedReco rdByTerm?subjectTerm=http://vocab.ndg.nerc.ac.uk/term/P021/c urrent/TEMP&subjectTerm=http://vocab.ndg.nerc.ac.uk/term/P02 1/current/PSAL&subjectTerm=http://vocab.ndg.nerc.ac.uk/term/P021/current/CPWC&objectList=http://vocab.ndg.nerc.ac.uk/list/P0 41/current&predicate=255&inferences=true
An XML document is returned containing the GCMD Science Keywords that map to the three BODC terms as both text strings and URLs
The document may be reformatted using XSLT or XQuery to generate the “parameters” section for the DIF
NERC DataGridNERC DataGrid
Smart Discovery Use CaseSmart Discovery Use Case
Ability to find datasets tagged ‘rainfall’ using the search term ‘precipitation’
Also includes so-called ‘faceted searches’
Find one ‘type of thing’ by searching for another ‘type of thing’
For example: Find datasets tagged ‘CTD’ (an instrument type)
using the search term ‘salinity’ (a phenomenon) Requires semantically rich relation ‘Salinity
measuredBy CTD’ System needs to understand ‘measuredBy’ (requires
rules)
NERC DataGridNERC DataGrid
Smart Discovery Use CaseSmart Discovery Use Case
Operational Smart Discovery requires:
An extensively populated full-blown ontology
A state of the art inference engine
VS API has Smart Discovery support methods
Based on SQL search on relational triple store
Inference functionality would need a locally-developed inference engine
Produces impressive demonstrations but not scalable to operational
NERC DataGridNERC DataGrid
VS Usage ModelsVS Usage Models
The dynamic drop-down list use case may be implemented in at least three ways
1. Client issues a VS call on each user interaction returning a relatively small XML document
2. Client uses one VS call to download the entire thesaurus into an RDF-aware tool and then interacts through a local API
3. Entire thesaurus loaded into RDF-aware tool on the server that is interrogated by the client through something like SPARQL
NERC DataGridNERC DataGrid
VS Usage ModelsVS Usage Models
Method 1
Experience shows it to work well for first three use cases
Smart Discovery could potentially require hundreds of server call per query.
Method 2
Requires a thick client Could be part of an installed package. Provides access to inference engines Well-suited to Smart Discovery Untested as far as we know.
NERC DataGridNERC DataGrid
VS Usage ModelsVS Usage Models
Method 3
Being developed by Marine Metadata Interoperability (MMI) project based on OWL rather than SKOS.
Provides access to inference engines Well-suited to Smart Discovery support