Top Banner
M.Benno Blumenthal, Michael Bell, John del Corral, and Emily Grover-Kopec International Research Institute for Climate and Society Columbia University http:// iridl.ldeo.columbia.edu / Using RDF/OWL Technologies for Discovery and Use Metadata
63

M.Benno Blumenthal, Michael Bell, John del Corral, and Emily Grover-Kopec International Research Institute for Climate and Society Columbia University.

Dec 13, 2015

Download

Documents

Sharon Sherman
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: M.Benno Blumenthal, Michael Bell, John del Corral, and Emily Grover-Kopec International Research Institute for Climate and Society Columbia University.

M.Benno Blumenthal, Michael Bell, John del Corral, and Emily Grover-Kopec

International Research Institute for Climate and Society

Columbia University

http://iridl.ldeo.columbia.edu/

Using RDF/OWL Technologies for Discovery and Use Metadata

Page 2: M.Benno Blumenthal, Michael Bell, John del Corral, and Emily Grover-Kopec International Research Institute for Climate and Society Columbia University.

Definitions

• Resource Description Framework (RDF)

• Web Ontology Language (OWL)

Page 3: M.Benno Blumenthal, Michael Bell, John del Corral, and Emily Grover-Kopec International Research Institute for Climate and Society Columbia University.

Why RDF?

Web-based system for interoperating semantics

A key part of the Semantic Web

RDF/OWL is an interesting technology, but it is even more interesting when it is clear that it can help solve our problems

Page 4: M.Benno Blumenthal, Michael Bell, John del Corral, and Emily Grover-Kopec International Research Institute for Climate and Society Columbia University.

The Data Problem

Users

Datasets

Page 5: M.Benno Blumenthal, Michael Bell, John del Corral, and Emily Grover-Kopec International Research Institute for Climate and Society Columbia University.

The Tool Interface

Users

Datasets

Tools

Page 6: M.Benno Blumenthal, Michael Bell, John del Corral, and Emily Grover-Kopec International Research Institute for Climate and Society Columbia University.

Standard Metadata

Users

Datasets

Tools

Standard Metadata Schema/Data Services

Page 7: M.Benno Blumenthal, Michael Bell, John del Corral, and Emily Grover-Kopec International Research Institute for Climate and Society Columbia University.

Many Data Communities

Tools

Users

Datasets

Standard Metadata Schema

Tools

Users

Datasets

Standard Metadata Schema

Tools

Users

Datasets

Standard Metadata Schema

Tools

Users

Datasets

Standard Metadata Schema

Tools

Users

Datasets

Standard Metadata Schema

Page 8: M.Benno Blumenthal, Michael Bell, John del Corral, and Emily Grover-Kopec International Research Institute for Climate and Society Columbia University.

Super Schema

Tools

Users

Datasets

Standard Metadata Schema

Tools

Users

Datasets

Standard Metadata Schema

Tools

Users

Datasets

Standard Metadata Schema

Tools

Users

Datasets

Standard Metadata Schema

Tools

Users

Datasets

Standard Metadata Schema

Standard metadata schema

Page 9: M.Benno Blumenthal, Michael Bell, John del Corral, and Emily Grover-Kopec International Research Institute for Climate and Society Columbia University.

Super Schema: direct

Tools

Users

Datasets

Standard Metadata Schema

Tools

Users

Datasets

Standard Metadata Schema

Tools

Users

Datasets

Standard Metadata Schema

Tools

Users

Datasets

Standard Metadata Schema

Tools

Users

Datasets

Standard Metadata Schema

Standard metadata schema/data service

Page 10: M.Benno Blumenthal, Michael Bell, John del Corral, and Emily Grover-Kopec International Research Institute for Climate and Society Columbia University.

Flaws

• A lot of work

• Super Schema/Service is the Lowest-Common-Denominator

• Science keeps evolving, so that standards either fall behind or constantly change

Page 11: M.Benno Blumenthal, Michael Bell, John del Corral, and Emily Grover-Kopec International Research Institute for Climate and Society Columbia University.

RDF Standard Data Model Exchange

Tools

Users

Datasets

Standard Metadata Schema

Tools

Users

Datasets

Standard Metadata Schema

Tools

Users

Datasets

Standard Metadata Schema

Tools

Users

Datasets

Standard Metadata Schema

Tools

Users

Datasets

Standard Metadata Schema

Standard metadata schema

RDF

RDF

RDF

RDF

RDF

RDF

Page 12: M.Benno Blumenthal, Michael Bell, John del Corral, and Emily Grover-Kopec International Research Institute for Climate and Society Columbia University.

Standard metadata schema

Tools

Users

Datasets

Standard Metadata Schema

RDF

RDFRDF

Tools

Users

Datasets

Standard Metadata Schema

RDF

RDFRDF

Tools

Users

Datasets

Standard Metadata Schem

RDF

RDFRDF

RDF Data Model Exchange

RDF

Tools

Users

Datasets

Standard Metadata Schema

RDF

RDFRDF

Tools

Users

Datasets

Standard Metadata Schema

RDF

RDFRDF

Page 13: M.Benno Blumenthal, Michael Bell, John del Corral, and Emily Grover-Kopec International Research Institute for Climate and Society Columbia University.

RDF Architecture

RDF

RDF RDF

RDF

RDF RDF

RDF

RDF RDF

RDF

RDF

RDF RDF

RDF

RDF RDF

Virtual (derived) RDF

queries queries queries

Page 14: M.Benno Blumenthal, Michael Bell, John del Corral, and Emily Grover-Kopec International Research Institute for Climate and Society Columbia University.

Why is this better?

• Maps the original dataset metadata into a standard format that can be transported and manipulated

• Still the same impedance mismatch when mapped to the least-common-denominator standard metadata, but

• When a better standard comes along, the original complete-but-nonstandard metadata is already there to be remapped, and “late semantic binding” means everyone can use the new semantic mapping

• Can uses enhanced mappings between models that are close

• EASIER – these are tools to enhance the mapping process

Page 15: M.Benno Blumenthal, Michael Bell, John del Corral, and Emily Grover-Kopec International Research Institute for Climate and Society Columbia University.

Sample Tool: Faceted Searchhttp://iridl.ldeo.columbia.edu/ontologies/query2.pl?...

Page 16: M.Benno Blumenthal, Michael Bell, John del Corral, and Emily Grover-Kopec International Research Institute for Climate and Society Columbia University.

Distinctive Features of the search

• Search terms are interrelated

• terms that describe the set of returns are displayed (spanning and not)

• Returned items also have structure (sub-items and superseded items are not shown)

Page 17: M.Benno Blumenthal, Michael Bell, John del Corral, and Emily Grover-Kopec International Research Institute for Climate and Society Columbia University.

Architectural Features of the search

• Multiple search structures possible

• Multiple languages possible

• Search structure is kept in the database, not in the code

http://iridl.ldeo.columbia.edu/ontologies/query2.pl

Page 18: M.Benno Blumenthal, Michael Bell, John del Corral, and Emily Grover-Kopec International Research Institute for Climate and Society Columbia University.

Cast of RDF Characters

Semantic

Layers

Query

Language

Tools and Frameworks

SPARQL Protégé

RDFS SeRQL

OWL Sesame

SKOS Reasoners Redland

SWRL Jena

Page 19: M.Benno Blumenthal, Michael Bell, John del Corral, and Emily Grover-Kopec International Research Institute for Climate and Society Columbia University.

Triplets of • Subject• Property (or Predicate)• Object

URI’s identify things, i.e. most of the aboveNamespaces are used as a convenient

shorthand for the URI’s

RDF: framework for writing connections

Page 20: M.Benno Blumenthal, Michael Bell, John del Corral, and Emily Grover-Kopec International Research Institute for Climate and Society Columbia University.

Datatype Properties

{WOA} dc:title “NOAA NODC WOA01”

{WOA} dc:description “NOAA NODC WOA01: World Ocean Atlas 2001, an atlas of objectively analyzed fields of major ocean parameters at monthly, seasonal, and annual time scales. Resolution: 1x1; Longitude: global; Latitude: global; Depth: [0 m,5500 m]; Time: [Jan,Dec]; monthly”

Page 21: M.Benno Blumenthal, Michael Bell, John del Corral, and Emily Grover-Kopec International Research Institute for Climate and Society Columbia University.

Object Properties

{WOA} iridl:isContainerOf {Grid-1x1},

{Grid-1x1} iridl:isContainerOf {Monthly}

Page 22: M.Benno Blumenthal, Michael Bell, John del Corral, and Emily Grover-Kopec International Research Institute for Climate and Society Columbia University.

WOA01 diagram

Page 23: M.Benno Blumenthal, Michael Bell, John del Corral, and Emily Grover-Kopec International Research Institute for Climate and Society Columbia University.

Standard Properties

{WOA} dcterm:hasPart {Grid-1x1},{Grid-1x1} dcterm:hasPart {MONTHLY}

Alternatively

{WOA} iridl:isContainerOf {Grid-1x1},{iridl:isContainerOf} rdfs:subPropertyOf

{dcterm:hasPart}

Page 24: M.Benno Blumenthal, Michael Bell, John del Corral, and Emily Grover-Kopec International Research Institute for Climate and Society Columbia University.

{SST} rdf:type {cfatt:non_coordinate_variable}, {SST} cfatt:standard_name {cf:sea_surface_temperature}, {SST} netcdf:hasDimension {longitude}

netcdf/CF in RDF

Object properties provide a framework for explicitly writing down relationships between data objects/components, e.g. vague meaning of nesting is made explicit

Properties also can be related, since they are objects too

Page 25: M.Benno Blumenthal, Michael Bell, John del Corral, and Emily Grover-Kopec International Research Institute for Climate and Society Columbia University.

Noncontextual Modeling

• “noncontextual modeling make RDF the perfect glue between systems and fixed data models” – The Semantic Web

Page 26: M.Benno Blumenthal, Michael Bell, John del Corral, and Emily Grover-Kopec International Research Institute for Climate and Society Columbia University.

RDF Level

• Transport/Exchange (RDF/XML)

• Storage

• RDF APIs (Redland,Jena,Sesame)

• Query (SPARQL,SeRQL, …)

• Basic Semantics

Page 27: M.Benno Blumenthal, Michael Bell, John del Corral, and Emily Grover-Kopec International Research Institute for Climate and Society Columbia University.

RDF SemanticsRDF Primer

Truly useful property rdf:type “a”

Underlying Class rdf:Property

Organizational Classes rdf:Bag rdf:Alt rdf:Seq

rdf:List

Structured values rdf:value

Reification rdf:Statement: rdf:subject rdf:predicate rdf:object

Bag Properties rdf:_1 rdf:_2 …

List Properties rdf:first rdf:rest:rdf:nil

Page 28: M.Benno Blumenthal, Michael Bell, John del Corral, and Emily Grover-Kopec International Research Institute for Climate and Society Columbia University.

RDF-Schema (RDFS)

Transitive Properties rdfs:subClassOf (“is a”), rdfs:subPropertyOf

rdfs:Class, rdfs:Resource

rdfs:member

rdfs:domain, rdfs:range

rdfs:Datatype, rdfs:Literal, rdfs:Container

Refering to other RDF documents

rdfs:seeAlso, rdfs:isDefinedBy

Basic documentation rdfs:label, rdfs:comment

Page 29: M.Benno Blumenthal, Michael Bell, John del Corral, and Emily Grover-Kopec International Research Institute for Climate and Society Columbia University.

Gazetteer Classes

Page 30: M.Benno Blumenthal, Michael Bell, John del Corral, and Emily Grover-Kopec International Research Institute for Climate and Society Columbia University.

Gazetteer Individuals

Page 31: M.Benno Blumenthal, Michael Bell, John del Corral, and Emily Grover-Kopec International Research Institute for Climate and Society Columbia University.

Search Interface Term

• http://iri.columbia.edu/~benno/sampleterm.pdf

Page 32: M.Benno Blumenthal, Michael Bell, John del Corral, and Emily Grover-Kopec International Research Institute for Climate and Society Columbia University.

Semantics lead to Virtual Triples

Transitive: {a} rdfs:subClassOf {b} rdfs:subClassOf {c} implies {a} rdfs:subClassOf {c}i.e. semantics of rdfs:subClassOf imply additional triples not

explicitly statedLikewise:{a} rdfs:subPropertyOf{b} rdfs:subPropertyOf {c}implies {a} rdfs:subPropertyOf {c}More interestingly,{a} myprop {b}, {myprop} rdfs:subPropertyOf {prop2}

implies {a} prop2 {b}

Page 33: M.Benno Blumenthal, Michael Bell, John del Corral, and Emily Grover-Kopec International Research Institute for Climate and Society Columbia University.

Subcategories are not subClasses

So carelessly translating existing conceptual organizations can get one into trouble

Page 34: M.Benno Blumenthal, Michael Bell, John del Corral, and Emily Grover-Kopec International Research Institute for Climate and Society Columbia University.

Domain and Range are inherited

Since the domain and range of a property are classes, then subclasses “inherit” properties (in this sense)

Page 35: M.Benno Blumenthal, Michael Bell, John del Corral, and Emily Grover-Kopec International Research Institute for Climate and Society Columbia University.

UML/RDFS

• Unified Modeling Language

• Base concepts are the same (RDFS lacks methods), so one can export the underlying structure of the code as the underlying structure for the metadata

• See Representing UML in RDF

Page 36: M.Benno Blumenthal, Michael Bell, John del Corral, and Emily Grover-Kopec International Research Institute for Climate and Society Columbia University.

Ontologies

Use Conventions to connect concepts to established sets of concepts

Generate additional “virtual” triples from the original set and semantics

RDFS – some property/class semantics

OWL – additional property/class semantics: more sophisticated (ontological) relationships

Page 37: M.Benno Blumenthal, Michael Bell, John del Corral, and Emily Grover-Kopec International Research Institute for Climate and Society Columbia University.

OWL

Language for expressing ontologies, i.e. the semantics are very important. However, even without a reasoner to generate the implied RDF statements, OWL classes and properties represent a sophistication of the RDF Schema

However, there is a serious split in world view from what we have been talking about: concepts as classes vs concepts as individuals

Page 38: M.Benno Blumenthal, Michael Bell, John del Corral, and Emily Grover-Kopec International Research Institute for Climate and Society Columbia University.

OWL

rdf:Property owl:DatatypeProperty owl:ObjectProperty owl:AnnotationProperty

owl:FunctionalProperty owl:InverseFunctionalProperty owl:TransitiveProperty owl:SymmetricProperty

rdfs:seeAlso owl:imports

owl:ontology

Page 39: M.Benno Blumenthal, Michael Bell, John del Corral, and Emily Grover-Kopec International Research Institute for Climate and Society Columbia University.

Protégé

Tool for editing/displaying Ontologies

Different “tabs” display different perspectives

http://protege.stanford.edu/

Page 40: M.Benno Blumenthal, Michael Bell, John del Corral, and Emily Grover-Kopec International Research Institute for Climate and Society Columbia University.

Cast of RDF Characters II

Semantic

Layers

Query

Language

Tools and Frameworks

SPARQL Protégé

RDFS SeRQL

OWL Sesame

SKOS Reasoners Redland

SWRL Jena

Page 41: M.Benno Blumenthal, Michael Bell, John del Corral, and Emily Grover-Kopec International Research Institute for Climate and Society Columbia University.

Query Language: SPARQL

• (quick reference at http://www.dajobe.org/2005/04-sparql/)

• Supported by Redland, Jena, Sesame-2.0 (alpha)

• Jena implementation supports url source of triples, i.e. do not even need a triple store

• The standard

Page 42: M.Benno Blumenthal, Michael Bell, John del Corral, and Emily Grover-Kopec International Research Institute for Climate and Society Columbia University.

Query Language: SeRQL

• Older than SPARQL

• Implemented on top of Sesame

• Currently more powerful than SPARQL, i.e. has nested queries

Page 43: M.Benno Blumenthal, Michael Bell, John del Corral, and Emily Grover-Kopec International Research Institute for Climate and Society Columbia University.

SeRQL DetailsCopied from on-line tutorial

• Syntax

• Select

• Construct

• Where

• From

Page 44: M.Benno Blumenthal, Michael Bell, John del Corral, and Emily Grover-Kopec International Research Institute for Climate and Society Columbia University.

SeRQL: basic syntax

{person} foo:worksFor {Company} rdf:type {foo:ITCompany}

Page 45: M.Benno Blumenthal, Michael Bell, John del Corral, and Emily Grover-Kopec International Research Institute for Climate and Society Columbia University.

SeRQL: multiple statements

{subj1} pred1 {obj1}; pred2 {obj2}

Or

{subj1} pred1 {obj1} , {subj1} pred2 {obj2}

Page 46: M.Benno Blumenthal, Michael Bell, John del Corral, and Emily Grover-Kopec International Research Institute for Climate and Society Columbia University.

SeRQL: short cuts

{subj1} pred1 {obj1,obj2,obj3}

(also implies obj1,obj2,obj3 are distinct)

Page 47: M.Benno Blumenthal, Michael Bell, John del Corral, and Emily Grover-Kopec International Research Institute for Climate and Society Columbia University.

SeRQL: Select

SELECT dataset, dlabel

FROM {dataset} rdf:type {iridl:dataset},

[{dataset} rdfs:label {dlabel}]

USING NAMESPACE

iridl = <http://iridl.ldeo.columbia.edu/ontologies/iridl.owl>

Output as table (XML)

Page 48: M.Benno Blumenthal, Michael Bell, John del Corral, and Emily Grover-Kopec International Research Institute for Climate and Society Columbia University.

SeRQL:Construct

CONSTRUCT {dataset} rdf:type {foo:LabelledDatasets}

FROM {dataset} rdf:type {iridl:dataset}; rdfs:label {dlabel}

USING NAMESPACE

iridl = <http://iridl.ldeo.columbia.edu/ontologies/iridl.owl>

Output as RDF (RDF/XML)

Page 49: M.Benno Blumenthal, Michael Bell, John del Corral, and Emily Grover-Kopec International Research Institute for Climate and Society Columbia University.

Faceted Search Explicated

Page 50: M.Benno Blumenthal, Michael Bell, John del Corral, and Emily Grover-Kopec International Research Institute for Climate and Society Columbia University.

Search Interface

• Items (datasets/maps)

• Terms

• Facets

• Taxa

Page 51: M.Benno Blumenthal, Michael Bell, John del Corral, and Emily Grover-Kopec International Research Institute for Climate and Society Columbia University.

Search Interface Semantic API

{item} dc:title dc:description rss:link iridl:icon dcterm:isPartOf {item2} dcterm:isReplacedBy {item2}

{item} trm:isDescribedBy {term}

{term} a {facet} of {taxa} of {trm:Term},{facet} a {trm:Facet}, {taxa} a {trm:Taxa},{term} trm:directlyImplies {term2}

Page 52: M.Benno Blumenthal, Michael Bell, John del Corral, and Emily Grover-Kopec International Research Institute for Climate and Society Columbia University.

Faceted Search w/Querieshttp://iridl.ldeo.columbia.edu/ontologies/query2.pl?...

Page 53: M.Benno Blumenthal, Michael Bell, John del Corral, and Emily Grover-Kopec International Research Institute for Climate and Society Columbia University.

RDF Architecture

RDF

RDF RDF

RDF

RDF RDF

RDF

RDF RDF

RDF

RDF

RDF RDF

RDF

RDF RDF

Virtual (derived) RDF

queries queries queries

Page 54: M.Benno Blumenthal, Michael Bell, John del Corral, and Emily Grover-Kopec International Research Institute for Climate and Society Columbia University.

Data ServersOntologies

MMI

JPL

StandardsOrganizations

Start Point

RDF Crawler

RDFS SemanticsOwl SemanticsSWRL Rules

SeRQL CONSTRUCT

Search Queries

LocationCanonicalizer

TimeCanonicalizer

Sesame

Search Interface

bibliography

IRI RDF Architecture

Page 55: M.Benno Blumenthal, Michael Bell, John del Corral, and Emily Grover-Kopec International Research Institute for Climate and Society Columbia University.

Creating Virtual Triples from Semantic Layers

Semantic

Layers

Query

Language

Tools and Frameworks

SPARQL Protégé

RDFS SeRQL

OWL Sesame

SKOS Reasoners Redland

SWRL Jena

Page 56: M.Benno Blumenthal, Michael Bell, John del Corral, and Emily Grover-Kopec International Research Institute for Climate and Society Columbia University.

SWRL

SWRL: A Semantic Web Rule Language Combining OWL and RuleML

A language for writing rules in RDF/OWL, i.e. RDF statements that are rules for creating new RDF statements

Page 57: M.Benno Blumenthal, Michael Bell, John del Corral, and Emily Grover-Kopec International Research Institute for Climate and Society Columbia University.

Simple Knowledge Organization System (SKOS)

Schema for relating concepts

Page 58: M.Benno Blumenthal, Michael Bell, John del Corral, and Emily Grover-Kopec International Research Institute for Climate and Society Columbia University.

Simple Knowledge Oranization System (SKOS)

• So, for a resource of type skos:Concept, any properties of that resource (such as creator, date of modification, source etc.) should be interpreted as properties of a concept, and not as properties of some 'real world thing' that that resource may be a conceptualisation of.

• This layer of indirection allows thesaurus-like data to be expressed as an RDF graph. The conceptual content of any thesaurus can of course be remodelled as an RDFS/OWL ontology. However, this remodelling work can be a major undertaking, particularly for large and/or informal thesauri. A SKOS Core representation of a thesaurus maps fairly directly onto the original data structures, and can therefore be created without expensive remodelling and analysis

Page 59: M.Benno Blumenthal, Michael Bell, John del Corral, and Emily Grover-Kopec International Research Institute for Climate and Society Columbia University.

RDF Frameworks

Protégé API

Redland Bindings in many languages, supports several triple stores, some with context

Jena Java API, some cmd line utilities, supports inference layers

Sesame HTTP server, Java API, supports inference, version 2 alpha has context

Page 60: M.Benno Blumenthal, Michael Bell, John del Corral, and Emily Grover-Kopec International Research Institute for Climate and Society Columbia University.

SesameSAIL- Storage and Inference Layer

i.e. you can write down rules that imply virtual triples so that triples are generated as they are put into the store

RDF No inference

RDFS RDFS inference

OWLIM Some OWL inference

Custom

Page 61: M.Benno Blumenthal, Michael Bell, John del Corral, and Emily Grover-Kopec International Research Institute for Climate and Society Columbia University.

Jena

Java framework

In-memory and persistent stores

Inference API

Page 62: M.Benno Blumenthal, Michael Bell, John del Corral, and Emily Grover-Kopec International Research Institute for Climate and Society Columbia University.

Topics/Issues

• OpenDAP and RDF: can we transport data semantics without fixing the entire schema?

• netcdf/HDF and RDF: do we need non-contextual modeling in our metadata transport/storage?

• Concepts as classes vs concepts as individuals• Sub-classes vs sub-categories• OWL in detail• Protégé demo

Page 63: M.Benno Blumenthal, Michael Bell, John del Corral, and Emily Grover-Kopec International Research Institute for Climate and Society Columbia University.

RDF Cast of Characters

Semantic

Layers

Query

Language

Tools and Frameworks

SPARQL Protégé

RDFS SeRQL

OWL Sesame

SKOS Reasoners Redland

SWRL Jena