NERC DataGrid Vocabulary Server Use Cases

NERC DataGridNERC DataGrid

NERC DataGridNERC DataGridVocabulary Server Vocabulary Server Use CasesUse Cases

Vocabulary Workshop, RAL, February 25, 2009Vocabulary Workshop, RAL, February 25, 2009


Use CasesUse Cases

Metadata population with verifiable content

Dynamic drop-down lists

Semantic cross-walk

Smart discovery

Vocabulary Server usage models


Metadata Population Use CaseMetadata Population Use Case

SeaDataNet is an EU project building a distributed data system across 30-40 European and Mediterranean data centres

Semantic infrastructure provided by NDG Vocabulary Server

SeaSearch was a precursor project federating metadata across a slightly smaller network

SeaSearch was plagued by local vocabulary maintenance allowing illegal values into documents

SeaDataNet adopted two strategies to address this



Strategy 1: constraint through tooling

Provide a metadata editor that Allows manual entry of XML metadata records Exports a simple RDBMS schema into XML

Link this up to the Vocabulary Server API to Populate drop-down lists Verify fields populated from vocabularies as they are

output

SeaDataNet Mikado tool does this



Strategy 2: constraint through validation

Problem is not everybody uses the tools SeaDataNet metadata documents include

Schematron code to validate field content Schematron maintained by software polling

Vocabulary Server API Records validated at source using Schematron-aware

tool (e.g. Oxygen 8 or later) or on-line validation service


Dynamic Drop-Down List Use CaseDynamic Drop-Down List Use Case

SeaDataNet marks up data using BODC Parameter Usage Vocabulary (21000 terms)

Navigation of something this size is a potential issue

Addressed by building three layers of increasingly broad terms over the top

Layers linked together using SKOS mappings



Search client required to exploit this

An obvious design for this is a series of drop-down lists working down the hierarchy

These need to be dynamically populated to keep up to date with the master vocabulary versions



The following URL gives all terms from the top level hierarchy:

http://vocab.ndg.nerc.ac.uk/list/P081/current

This may be used to set up a list of hot-linked labels pointing to Vocabulary Server concept URLs such as:

http://vocab.ndg.nerc.ac.uk/term/P081/3/DS02

Represents the concept ‘chemical oceanography’

When selected by the user a Vocabulary Server call is issues and…..



…we get a SKOS document thus

<?xml version="1.0" ?>

- <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:skos="http://www.w3.org/2004/02/skos/core#" xmlns:dc="http://purl.org/dc/elements/1.1/">

- <skos:Concept rdf:about="http://vocab.ndg.nerc.ac.uk/term/P081/3/DS02">

<skos:externalID>SDN:P081:3:DS02</skos:externalID>

<skos:prefLabel>Chemical oceanography</skos:prefLabel>

<skos:altLabel>Chemical oceanography</skos:altLabel>

<skos:definition>The chemical oceanographic science domain</skos:definition>

<dc:date>2009-02-10T10:30:20.052+0000</dc:date>

<skos:narrowMatch rdf:resource="http://vocab.ndg.nerc.ac.uk/term/P031/12/B007" />

<skos:narrowMatch rdf:resource="http://vocab.ndg.nerc.ac.uk/term/P031/12/C003" />













</skos:Concept>

</rdf:RDF>



This delivers a set of URIs from the next level down in the hierarchy

Again, these may be displayed as hot-linked labels and again the user selects one to drill down into the next layer of the hierarchy through another VS call

Maris BV in the Netherlands have linked this to Ajax to produce a client


Semantic Crosswalk Use CaseSemantic Crosswalk Use Case

BODC wishes to produce a GCMD DIF document from an EDMED V1.2 document

The “parameter” sections of the two documents are populated using different vocabularies (BODC PDV and GCMD Science Keywords)

This situation was usually addressed by having no parameter section in the output document. We can now do better…



A list of BODC PDV terms as parameter URNs is obtained from the EDMED document, for example:

SDN:P021:24:TEMPSDN:P021:24:PSALSDN:P021:24:CPWC

This may then translated into a list of URLs

http://vocab.ndg.nerc.ac.uk/term/24/TEMPhttp://vocab.ndg.nerc.ac.uk/term/24/PSALhttp://vocab.ndg.nerc.ac.uk/term/24/CPWC



This list may be rolled into an HTTP get request thus:

http://vocab.ndg.nerc.ac.uk/axis2/services/vocab/getRelatedReco rdByTerm?subjectTerm=http://vocab.ndg.nerc.ac.uk/term/P021/c urrent/TEMP&subjectTerm=http://vocab.ndg.nerc.ac.uk/term/P02 1/current/PSAL&subjectTerm=http://vocab.ndg.nerc.ac.uk/term/P021/current/CPWC&objectList=http://vocab.ndg.nerc.ac.uk/list/P0 41/current&predicate=255&inferences=true

An XML document is returned containing the GCMD Science Keywords that map to the three BODC terms as both text strings and URLs

The document may be reformatted using XSLT or XQuery to generate the “parameters” section for the DIF


Smart Discovery Use CaseSmart Discovery Use Case

Ability to find datasets tagged ‘rainfall’ using the search term ‘precipitation’

Also includes so-called ‘faceted searches’

Find one ‘type of thing’ by searching for another ‘type of thing’

For example: Find datasets tagged ‘CTD’ (an instrument type)

using the search term ‘salinity’ (a phenomenon) Requires semantically rich relation ‘Salinity

measuredBy CTD’ System needs to understand ‘measuredBy’ (requires

rules)


Smart Discovery Use CaseSmart Discovery Use Case

Operational Smart Discovery requires:

An extensively populated full-blown ontology

A state of the art inference engine

VS API has Smart Discovery support methods

Based on SQL search on relational triple store

Inference functionality would need a locally-developed inference engine

Produces impressive demonstrations but not scalable to operational


VS Usage ModelsVS Usage Models

The dynamic drop-down list use case may be implemented in at least three ways

1. Client issues a VS call on each user interaction returning a relatively small XML document

2. Client uses one VS call to download the entire thesaurus into an RDF-aware tool and then interacts through a local API

3. Entire thesaurus loaded into RDF-aware tool on the server that is interrogated by the client through something like SPARQL



Method 1

Experience shows it to work well for first three use cases

Smart Discovery could potentially require hundreds of server call per query.

Method 2

Requires a thick client Could be part of an installed package. Provides access to inference engines Well-suited to Smart Discovery Untested as far as we know.



Method 3

Being developed by Marine Metadata Interoperability (MMI) project based on OWL rather than SKOS.

Provides access to inference engines Well-suited to Smart Discovery support

NERC DataGrid Vocabulary Server Use Cases

Documents