Top Banner
Simon J D Cox, Jonathan Yu, Megan Williams, Fabrizio Giabardo, Dominic Lowe 16 April 2015 LAND AND WATER FLAGSHIP Technologies and practices for maintaining and publishing earth science vocabularies
50

Technologies and practices for maintaining and publishing earth science vocabularies

Jul 20, 2015

Download

Science

drshorthair
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Technologies and practices for maintaining and publishing earth science vocabularies

Simon J D Cox, Jonathan Yu, Megan Williams, Fabrizio Giabardo, Dominic Lowe

16 April 2015

LAND AND WATER FLAGSHIP

Technologies and practices for maintaining and publishing earth science vocabularies

Page 2: Technologies and practices for maintaining and publishing earth science vocabularies

Are these the same?

Publishing earth science vocabularies | Cox, Yu, Williams, Giabardo, Lowe

“nitrogen”

“dissolved nitrogen”

“Total nitrogen, water, filtered, milligrams per liter”

“Concentration of nitrogen (total) per unit volume of the water body [dissolved plus reactive particulate phase] by oxidation and colorimetric autoanalysis“

“Concentration of nitrogen (total) per unit mass of the water body [dissolved plus reactive particulate <GF/F phase] by filtration and high temperature Pt catalytic oxidation”

“Concentration (moles or mass) of total nitrogen (i.e. nitrogen in all chemical forms) in suspended particulate material per unit volume of the water column.”

“Concentration of nitrogen (total) {'PON'} per unit volume of the water body [particulate 2-10um phase] by filtration, acidification and elemental analysis”

“Dissolved total and organic nitrogen concentrations in the water column”

2 |

Page 3: Technologies and practices for maintaining and publishing earth science vocabularies

Why are vocabularies important?

Publishing earth science vocabularies | Cox, Yu, Williams, Giabardo, Lowe3 |

Page 4: Technologies and practices for maintaining and publishing earth science vocabularies

OM_Observation

+ phenomenonTime

+ resultTime

+ validTime [0..1]

+ resultQuality [0..*]

+ parameter [0..*]

GFI_PropertyTypeGFI_Feature

OM_ProcessGFI_DomainFeature Any

+observedProperty

1+propertyValueProvider

0..*

+featureOfInterest

1

+generatedObservation

0..*

+procedure1 +result

Range

observed property

Parameter dictionary

procedure

Register of sensors, processes & algorithms

feature of interest

Feature-type catalogue

Feature service

result format:

GML, SWE, netCDF, JSON, SQLite...

O&M domain specialization

Publishing earth science vocabularies | Cox, Yu, Williams, Giabardo, Lowe4 |

Page 5: Technologies and practices for maintaining and publishing earth science vocabularies

RDF Data Cube 101 - Slices and observations

Dimension d6

Dimension d7

Dimension d1

Dimension d2

Dimension d3

Dimension d4

Dimension d5

Measure m1, m2, …

Attribute a1, a2, …

Cube

Slice

Observation

Publishing earth science vocabularies | Cox, Yu, Williams, Giabardo, Lowe5 |A linked sensor data cube, Lefort, 5th Intl. SSN workshop, 2012

Page 6: Technologies and practices for maintaining and publishing earth science vocabularies

W3C Data Cube ontology

Publishing earth science vocabularies | Cox, Yu, Williams, Giabardo, Lowe6 |

Each axis or variable

specified as a skos:Concept

Values of coded-properties

selected from a

skos:ConceptScheme

Homogeneous observations,

common structure definition

The RDF Data Cube Vocabulary, Cyganiak & Reynolds, W3C Recommendation 2014

Page 7: Technologies and practices for maintaining and publishing earth science vocabularies

What is available?

Publishing earth science vocabularies | Cox, Yu, Williams, Giabardo, Lowe7 |

Page 8: Technologies and practices for maintaining and publishing earth science vocabularies

AGU

Publishing earth science vocabularies | Cox, Yu, Williams, Giabardo, Lowe8 |

Page 9: Technologies and practices for maintaining and publishing earth science vocabularies

Thomson-Reuters

Publishing earth science vocabularies | Cox, Yu, Williams, Giabardo, Lowe9 |

Page 10: Technologies and practices for maintaining and publishing earth science vocabularies

ANZSRC

Publishing earth science vocabularies | Cox, Yu, Williams, Giabardo, Lowe10 |

Page 11: Technologies and practices for maintaining and publishing earth science vocabularies

Publishing earth science vocabularies | Cox, Yu, Williams, Giabardo, Lowe11 |

Page 12: Technologies and practices for maintaining and publishing earth science vocabularies

ICS

Publishing earth science vocabularies | Cox, Yu, Williams, Giabardo, Lowe

12 |

Page 13: Technologies and practices for maintaining and publishing earth science vocabularies

GSSP

Publishing earth science vocabularies | Cox, Yu, Williams, Giabardo, Lowe13 |

Page 14: Technologies and practices for maintaining and publishing earth science vocabularies

GCMD

Publishing earth science vocabularies | Cox, Yu, Williams, Giabardo, Lowe14 |

Page 15: Technologies and practices for maintaining and publishing earth science vocabularies

Standard ontology of chemicals

Publishing earth science vocabularies | Cox, Yu, Williams, Giabardo, Lowe15 |

Page 16: Technologies and practices for maintaining and publishing earth science vocabularies

Vocabulary formalization

Publishing earth science vocabularies | Cox, Yu, Williams, Giabardo, Lowe16 |

Page 17: Technologies and practices for maintaining and publishing earth science vocabularies

Formalization: RDF – SKOS for basic vocabularies

Publishing earth science vocabularies | Cox, Yu, Williams, Giabardo, Lowe17 |

chem:sodium

a skos:Concept ;

rdfs:label "sodium"^^xsd:string ;

skos:broader chem:alkali ;

skos:exactMatch <http://dbpedia.org/resource/Sodium> ;

skos:inScheme skos:chemicals ;

skos:prefLabel "nátrium"@hu , "sodio"@it , "sodium"@fr , "sodium"@en .

Page 18: Technologies and practices for maintaining and publishing earth science vocabularies

RDFS

Semantic web dead long live semantic web | Simon Cox18 |

GeochronEraTemporalReference

System

componentmember

skos:ConceptSchemeskos:Concept

skos:hasTopConcept skos:narrowersubClassOf subClassOf

subPropertyOf subPropertyOf

domain

domain

domain

range

rangerange

domain range

Page 19: Technologies and practices for maintaining and publishing earth science vocabularies

Inferencing

• Entailments and reasoning

• What does this combination of axioms imply?

• Is there anything unexpected?

Phanerozoic

Cenozoic

Neogene

StratigraphicChart

GeochronEra

TemporalReferenceSystem

type

type

type

type

component member

member

hasTopConcept narrower

narrowernarrowerTransitive

Concept

ConceptScheme

broaderTransitive

Semantic web dead long live semantic web | Simon Cox19 |

Page 20: Technologies and practices for maintaining and publishing earth science vocabularies

Formalization and encoding process

Create order within existing excel spreadsheets

Every layout is different

Publishing earth science vocabularies | Cox, Yu, Williams, Giabardo, Lowe20 |

Page 21: Technologies and practices for maintaining and publishing earth science vocabularies

Formalization and encoding process

RDF 123

Every mapping

is different

Publishing earth science vocabularies | Cox, Yu, Williams, Giabardo, Lowe21 |

Page 22: Technologies and practices for maintaining and publishing earth science vocabularies

Formalization and encoding process

Turtle,

in text editor …

Publishing earth science vocabularies | Cox, Yu, Williams, Giabardo, Lowe22 |

Page 23: Technologies and practices for maintaining and publishing earth science vocabularies

People + judgement

Publishing earth science vocabularies | Cox, Yu, Williams, Giabardo, Lowe23 |

Page 24: Technologies and practices for maintaining and publishing earth science vocabularies

Vocabulary distribution

Publishing earth science vocabularies | Cox, Yu, Williams, Giabardo, Lowe24 |

Page 25: Technologies and practices for maintaining and publishing earth science vocabularies

• Physical documents, PDF

• Tables on web pages

• Bespoke XML documents

• RDF documents, OWL documents

• Web services

• RESTful web resources, Linked data

Vocabulary services | Cox & Yu

Delivery

Page 26: Technologies and practices for maintaining and publishing earth science vocabularies

Publish as linked data

URI = web-scale foreign-key

Publishing earth science vocabularies | Cox, Yu, Williams, Giabardo, Lowe26 |

Page 27: Technologies and practices for maintaining and publishing earth science vocabularies

Linked vocabularies can be shared and re-used

Publishing earth science vocabularies | Cox, Yu, Williams, Giabardo, Lowe27 |

Page 28: Technologies and practices for maintaining and publishing earth science vocabularies

Publishing earth science vocabularies | Cox, Yu, Williams, Giabardo, Lowe28 |

Page 29: Technologies and practices for maintaining and publishing earth science vocabularies

Status and lifecycle

Publishing earth science vocabularies | Cox, Yu, Williams, Giabardo, Lowe29 |

Page 30: Technologies and practices for maintaining and publishing earth science vocabularies

Publishing earth science vocabularies | Cox, Yu, Williams, Giabardo, Lowe30 |

Page 31: Technologies and practices for maintaining and publishing earth science vocabularies

Governance issues, design flaws

Publishing earth science vocabularies | Cox, Yu, Williams, Giabardo, Lowe31 |

Page 32: Technologies and practices for maintaining and publishing earth science vocabularies

Governance issues

What is the best way to re-use existing content already published as linked data?

Do we fix it for them? Do we re-claim it?

Vocabulary deployment and governance | Cox32 |

Page 33: Technologies and practices for maintaining and publishing earth science vocabularies

Modeling flaws

GCMD science keywords

• Same textual definition, same label

• Different parent, different URI

– are they the same concept?

Vocabulary deployment and governance | Cox33 |

Page 34: Technologies and practices for maintaining and publishing earth science vocabularies

Re-base the URI?

<http://registry.it.csiro.au/def/kwa/gcmd/ABRASION>

a skos:Concept ;

rdfs:label "ABRASION" ;

dct:description "Mechanical scraping of a rock surface by friction between rocks and moving particles."@en ;

owl:sameAs

<http://gcmdservices.gsfc.nasa.gov/kms/concept/8f57f4b0-5177-4362-81e8-ced75d37d1aa> , <http://gcmdservices.gsfc.nasa.gov/kms/concept/fd29bf77-df38-4b80-8148-8184fa41d843> , <http://gcmdservices.gsfc.nasa.gov/kms/concept/efacd4f6-59ea-4019-8265-8cc81ecc99c0> , <http://gcmdservices.gsfc.nasa.gov/kms/concept/f6e19e2e-555a-4d40-9833-c7513d92c813> ;

skos:prefLabel "ABRASION"@en .

Vocabulary deployment and governance | Cox34 |

Page 35: Technologies and practices for maintaining and publishing earth science vocabularies

Versioning flaws

NASA SWEET

http://sweet.jpl.nasa.gov/1.1/time.owl#PLIOCENE

http://sweet.jpl.nasa.gov/2.0/timeGeologic.owl#Pliocene

http://sweet.jpl.nasa.gov/2.1/reprTimeGeologicPeriod.owl#Pliocene

http://sweet.jpl.nasa.gov/2.2/stateTimeGeologic.owl#Pliocene

http://sweet.jpl.nasa.gov/2.3/stateTimeGeologic.owll#Pliocene

• Same label, and same place in hierarchy

• Different URI

- are they the same concept?

Vocabulary deployment and governance | Cox35 |

Page 36: Technologies and practices for maintaining and publishing earth science vocabularies

Governance issues

Who is the expert? - Wikipedia??

Vocabulary deployment and governance | Cox36 |

Page 37: Technologies and practices for maintaining and publishing earth science vocabularies

Collection sub-set?

<http://registry.it.csiro.au/def/kwa/gcmd/GCMD-keywords-subset_newnames>

a skos:Collection ;

rdfs:label "Subset of GCMD keywords - re-based"̂ x̂sd:string ;

skos:member <http://registry.it.csiro.au/def/kwa/gcmd/ABLATION> , <http://registry.it.csiro.au/def/kwa/gcmd/ABRASION> , <http://registry.it.csiro.au/def/kwa/gcmd/ABLATION-ZONES-ACCUMULATION-ZONES> .

- Or -

<http://registry.it.csiro.au/def/kwa/gcmd/GCMD-keywords-subset>

a skos:Collection ;

rdfs:label "Subset of GCMD keywords"̂ x̂sd:string ;

skos:member

<http://gcmdservices.gsfc.nasa.gov/kms/concept/8f57f4b0-5177-4362-81e8-ced75d37d1aa> , <http://gcmdservices.gsfc.nasa.gov/kms/concept/95fbaefd-1afe-4887-a1ba-fc338a8109bb> , <http://gcmdservices.gsfc.nasa.gov/kms/concept/99db4dca-4d07-48fd-8ba3-393532d04aa6> , <http://gcmdservices.gsfc.nasa.gov/kms/concept/a994a6f6-cfcd-45d2-95a4-0f8455a9454d> , <http://gcmdservices.gsfc.nasa.gov/kms/concept/efacd4f6-59ea-4019-8265-8cc81ecc99c0> , <http://gcmdservices.gsfc.nasa.gov/kms/concept/fd29bf77-df38-4b80-8148-8184fa41d843> , <http://gcmdservices.gsfc.nasa.gov/kms/concept/f6e19e2e-555a-4d40-9833-c7513d92c813> .

Vocabulary deployment and governance | Cox37 |

Page 38: Technologies and practices for maintaining and publishing earth science vocabularies

More complex constraints?

OWL classes vs instances

cgi-lith-instance:carbonate_rich_mudstone a skos:Concept ;

rdfs:label "carbonate-rich mudstone" ;skos:broader cgi-lith-instance:rock_material ;CGI_Lith:ConsolDegree CGI_Lith:consolidated ;CGI_Lith:Constituents CGI_Lith:carbonateBearing ;CGI_Lith:GeneticCateg CGI_Lith:sedimentary ;CGI_Lith:GrainSize CGI_Lith:mud_size ;CGI_Lith:ParticleType CGI_Lith:grain .

Vocabulary deployment and governance | Cox38 |

Page 39: Technologies and practices for maintaining and publishing earth science vocabularies

Summary and conclusions

Publishing earth science vocabularies | Cox, Yu, Williams, Giabardo, Lowe39 |

Page 40: Technologies and practices for maintaining and publishing earth science vocabularies

Publishing earth science vocabularies | Cox, Yu, Williams, Giabardo, Lowe40 |

Source Vocabulary

(csv, html, txt)

Database (triple-store)

Formalized vocabulary

(skos/rdf)

Vocab service

LDR API SPARQL

SISSVoc

Page 41: Technologies and practices for maintaining and publishing earth science vocabularies

Summary

• Term vocabularies can be formalized in RDF (SKOS, OWL) and published as linked data

• Much content available, but needs converting (‘lifting’) to semantic technologies

• Excel, RDF123, Text editor, SKOS, LDR and SISSVoc are our enablers (but people are essential)

Publishing earth science vocabularies | Cox, Yu, Williams, Giabardo, Lowe41 |

Page 42: Technologies and practices for maintaining and publishing earth science vocabularies

Applications and published vocabularies

• GeoSciML vocabularies

• http://def.seegrid.csiro.au/sissvoc/cgi201211/collection

• http://resource.geosciml.org/classifier/ics/ischart/

• Environmental observations vocabularies

• http://environment.data.gov.au/def/

• http://registry.it.csiro.au/environment/def

• Bioregional assessments glossary

• http://registry.it.csiro.au/test1/ba-glossary

• Agriculture definitions

• http://registry.it.csiro.au/agriculture/def

• Australian Government definitions - AGIFT 2014, ANZSRC 2008 …

• http://registry.it.csiro.au/agldwg/def

• CSIRO Keyword aggregator … • Coming soon

Publishing earth science vocabularies | Cox, Yu, Williams, Giabardo, Lowe42 |

Page 43: Technologies and practices for maintaining and publishing earth science vocabularies

LAND AND WATER FLAGSHIP

Thank youEnvironmental Informatics InfrastructureSimon J D CoxResearch Scientist

t +61 3 9252 6342e [email protected] people.csiro.au/C/S/Simon-Cox

Jonathan YuResearch Engineer

t +61 3 9252 6440e [email protected] people.csiro.au/C/S/Jonathan-Yu

Page 44: Technologies and practices for maintaining and publishing earth science vocabularies

Publishing earth science vocabularies | Cox, Yu, Williams, Giabardo, Lowe44 |

Page 45: Technologies and practices for maintaining and publishing earth science vocabularies

SISSVoc UI & API for vocabulary query

Publishing earth science vocabularies | Cox, Yu, Williams, Giabardo, Lowe45 |

Page 46: Technologies and practices for maintaining and publishing earth science vocabularies

Publishing earth science vocabularies | Cox, Yu, Williams, Giabardo, Lowe46 |

Page 47: Technologies and practices for maintaining and publishing earth science vocabularies

Simplified Knowledge Organization System SKOS: a W3C Standard

Focus on the concept rather than the term

• Web/Linked data principle: Concept is identified by a URI

• Concept is annotated with text labels (i.e. the traditional ‘term’)

• Structured using hierarchical relations within a vocabulary• broader, narrower

• Matching relations between vocabularies• broadMatch, closeMatch, exactMatch

Publishing earth science vocabularies | Cox, Yu, Williams, Giabardo, Lowe47 |

Page 48: Technologies and practices for maintaining and publishing earth science vocabularies

• Physical documents, PDF

• Tables on web pages

• Bespoke XML documents

• RDF documents, OWL documents

• Web services

• RESTful web resources, Linked data

Publishing earth science vocabularies | Cox, Yu, Williams, Giabardo, Lowe

Delivery

Page 49: Technologies and practices for maintaining and publishing earth science vocabularies

O&M

OM_Observation

+ phenomenonTime

+ resultTime

+ validTime [0..1]

+ resultQuality [0..*]

+ parameter [0..*]

GF_PropertyType

GFI_Feature

OM_Process Any

+observedProperty

1

0..*

+featureOfInterest 1

0..*

+procedure1 +result

Publishing earth science vocabularies | Cox, Yu, Williams, Giabardo, LoweISO 19156:2011 Geographic Information – Observations and measurements – ed. S Cox49 |

Page 50: Technologies and practices for maintaining and publishing earth science vocabularies

Governance

Clear roles:

• Content is determined by the experts

• Formalization may uncover inconsistencies

• History and status must be visible

• No deletions! - retirement or supercession

Publishing earth science vocabularies | Cox, Yu, Williams, Giabardo, Lowe50 |