Semantic challenges in sharing dataset metadata and creating federated dataset catalogs The example of the CIARD RING Valeria Pesce (Global Forum on Agricultural Research and Innovation) Linked Open Data in Agriculture MACS-G20 Workshop in Berlin, September 27th–28th, 2017
21
Embed
Semantic challenges in sharing dataset metadata and creating federated dataset catalogs. The example of the CIARD RING.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Semantic challenges in sharing dataset metadata and creating
federated dataset catalogsThe example of the CIARD RING
Valeria Pesce (Global Forum on Agricultural Research and Innovation)
Linked Open Data in AgricultureMACS-G20 Workshop in Berlin, September 27th–28th, 2017
Semantics involved in describing datasets
NameOwnerType of dataTopic(s)Data standards usedData structurePlace of collectionDate of collectionDistribution(s)[…]
Dataset
for describing datasets, e.g. DCAT or DataCube
Metadata vocabulary or
ontology
Authority data
KOS / thesaurus
“Value vocabularies” / Knowledge Organization systems
Data of type Organization, e.g. VIAF
“De
scri
pti
on
vo
cab
ula
rie
s”
KOS / classification
Concepts suitable for organizing by Topic, e.g. AGROVOC
Concepts describing Types of data
Dataset structure
DimensionsAttributesMeasuresValue lists
for describing data structures, e.g. DataCube or STAT-DCAY
Metadata vocabulary or
ontology
Distribution
ProtocolURLFormatSize
for describing geospatial entities, e.g. GML
Schema
No universal agreed model or vocabulary!
for describing distributions, e.g. DCAT or VOID
Metadata vocabulary or
ontology
Semantics involved in describing datasets
NameOwnerType of dataTopic(s)Data standards usedData structurePlace of collectionDate of collectionDistribution(s)[…]
Dataset
for describing
datasets, e.g. DCATor DataCube
Metadata vocabulary or
ontology
Authority data
KOS / thesaurus
“Value vocabularies”
Data of type Organization, e.g. VIAF
“De
scri
pti
on
vo
cab
ula
rie
s”
KOS / classification
Concepts suitable for organizing by Topic, e.g. AGROVOC
Concepts describing Types of data
Dataset structure
DimensionsAttributesMeasuresValue lists
Metadata vocabulary or
ontology
Distribution
ProtocolURLFormatSize
for describing geospatial entities, e.g. GML
Schema
The dataset resource
Metadata vocabulary or
ontologyfor describing data structures, e.g. DataCube or STAT-DCAY for describing
distributions, e.g. DCAT or VOID
Semantics involved in describing datasets
NameOwnerType of dataTopic(s)Data standards usedData structurePlace of collectionDate of collectionDistribution(s)[…]
Dataset
Metadata vocabulary or
ontology
Authority data
KOS / thesaurus
“Value vocabularies”
Data of type Organization, e.g. VIAF
“De
scri
pti
on
vo
cab
ula
rie
s”
KOS / classification
Concepts suitable for organizing by Topic, e.g. AGROVOC
Concepts describing Types of data
Dataset structure
DimensionsAttributesMeasuresValue lists
for describing data structures, e.g.
DataCube or STAT-DCAT
Metadata vocabulary or
ontology
Distribution
ProtocolURLFormatSize
for describing geospatial entities, e.g. GML
Schema
The dataset structure
Metadata vocabulary or
ontologyfor describing datasets, e.g. DCAT or DataCube
for describing distributions, e.g. DCAT or VOID
Semantics involved in describing datasets
NameOwnerType of dataTopic(s)Data standards usedData structurePlace of collectionDate of collectionDistribution(s)[…]
Dataset
Metadata vocabulary or
ontology
Authority data
KOS / thesaurus
“Value vocabularies”
Data of type Organization, e.g. VIAF
“De
scri
pti
on
vo
cab
ula
rie
s”
KOS / classification
Concepts suitable for organizing by Topic, e.g. AGROVOC
Concepts describing Types of data
Dataset structure
DimensionsAttributesMeasuresValue lists
Metadata vocabulary or
ontology
Distribution
ProtocolURLFormatSize
for describing geospatial entities, e.g. GML
Schema
Datasetserialization
for describing datasets, e.g. DCAT or DataCube
for describing distributions, e.g.
DCAT or VOID
Metadata vocabulary or
ontologyfor describing data structures, e.g. DataCube or STAT-DCAY
Semantics needed to describe datasets
NameOwnerType of dataTopic(s)Data standards usedData structurePlace of collectionDate of collectionDistribution(s)[…]
Dataset
Metadata vocabulary or
ontology
Authority data
KOS / thesaurus
“Value vocabularies”
Data of type Organization, e.g. VIAF
“De
scri
pti
on
vo
cab
ula
rie
s”
KOS / classification
Concepts suitable for organizing by Topic, e.g. AGROVOC
Concepts describing Types of data
Dataset structure
DimensionsAttributesMeasuresValue lists
Metadata vocabulary or
ontology
Distribution
ProtocolURLFormatSize
for describing geospatial entities, e.g. GML
Schema
Reference value vocabularies
for describing distributions, e.g. DCAT or VOID
Metadata vocabulary or
ontologyfor describing datasets, e.g. DCAT or DataCube
for describing data structures, e.g. DataCube or STAT-DCAY
Semantics of the values
• Standardization of the values, e.g. for “thematic coverage” or “dimensions” of datasets, “format” or “protocol used” of distributions etc.
• The value should be standardized, possibly a URI
• The value should be part of an authority list / code list
RDF dataset vocabularies normally treat these values as resources, so identifiable by URIs, BUT…
• Cross-domain• Authority lists of organizations, projects: VIAF, CERIF, ORCID?• Geospatial / geopolitical data: GeoNames, FAO Geopolitical Ontology• Data formats / data standards? AgriSemantics Map of Standards• File formats: IANA types ( RDF?), W3C formats• Agreed list of types of data?• Units of measure?• Authority list of licenses (OpenDefinition list?)
Not for everything we would need!
The CIARD RING
The CIARD RING is a federated and curated catalog of agri-food datasetsand data services
http://ring.ciard.net
• a primary catalog (providers can catalog individual data services and datasets directly in the RING) exposing all metadata as RDF
• a federated catalog (it harvests dataset metadata from other catalogs)
URI that identifies the INSPIRE specification for Soil
Queries can leverage LOD mappings - 3
Conclusions
• The major semantic challenges when integrating (meta)data arise from the lack of use of common value vocabularies, not so much from the use of different description vocabularies / schemas / formats.
• In most cases the lack of good semantics in the (meta)data at the level of value vocabularies is not due to ill will or lack of awareness, but to the constraints posed by most dataset management tools.
• The machine-readable layer and the SPARQL endpoint of the RING are not for the end users: we expect this layer to be used by developers to build added-value services for the end users on top of the featured datasets.