Top Banner
Towards an RDF Validation Language based on Regular Expression Derivatives Eric Prud'hommeaux World Wide Web Consortium MIT, Cambridge, MA, USA Harold Solbrig Mayo Clinic USA College of Medicine, Rochester, MN, USA Jose Emilio Labra Gayo WESO Research group University of Oviedo Spain Sławek Staworko LINKS, INRIA & CNRS University of Lille, France
28

Towards an RDF Validation Language based on Regular Expression Derivatives

Jul 16, 2015

Download

Internet

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Towards an RDF Validation Language based on Regular Expression Derivatives

Towards an RDF Validation Languagebased on Regular Expression Derivatives

Eric Prud'hommeauxWorld Wide Web

ConsortiumMIT, Cambridge, MA, USA

Harold SolbrigMayo Clinic

USACollege of Medicine, Rochester,

MN, USA

Jose Emilio Labra GayoWESO Research groupUniversity of Oviedo

Spain

Sławek StaworkoLINKS, INRIA & CNRS

University of Lille, France

Page 2: Towards an RDF Validation Language based on Regular Expression Derivatives

Overview

Shape Expressions for RDF validation - Justification

Regular Shape Expressions Axiomatic Semantics

Implementation based on Derivatives

Regular Shape Expression Schemas Adapt Axiomatic Semantics to Schemas

Adapt Implementation based on Derivatives

Conclusions & Future work

Page 3: Towards an RDF Validation Language based on Regular Expression Derivatives

Shape Expressions

Simple and intuitive language that can:Describe the topology of RDF data

Validate that RDF instance data matches a shape

Two syntaxesCompact syntax (inspired by RelaxNG, Turtle and SPARQL)

RDF

Related to W3c RDF Data Shapes Working Group

Page 4: Towards an RDF Validation Language based on Regular Expression Derivatives

Example: RDF model of a Person

Person__

foaf:age xsd:integer

foaf:name xsd:string +

0..*

foaf:knows

:john foaf:age 23;foaf:name "John";foaf:knows :bob .

:bob foaf:age 34;foaf:name "Bob", "Robert" .

<Person> {foaf:age xsd:integer

, foaf:name xsd:string+, foaf:knows @<Person>*}

Shape Expressions Schema

Some RDF data

:mary foaf:age 50, 65 .

E-R Diagram

Page 5: Towards an RDF Validation Language based on Regular Expression Derivatives

Why not SPARQL?<Person> {foaf:age xsd:integer

, foaf:name xsd:string+, foaf:knows @<Person>*}ASK { { SELECT ?Person {

?Person foaf:age ?o .} GROUP BY ?Person HAVING (COUNT(*)=1)

}{ SELECT ?Person {

?Person foaf:age ?o .FILTER ( isLiteral(?o) &&

datatype(?o) = xsd:integer )} GROUP BY ?Person HAVING (COUNT(*)=1)

}...

123456789

10...

...{ SELECT ?Person (COUNT(*) AS ?Person_c0) {?Person foaf:name ?o .} GROUP BY ?Person HAVING (COUNT(*)>=1)

}{ SELECT ?Person (COUNT(*) AS ?Person_c1) {

?Person foaf:name ?o .FILTER (isLiteral(?o) &&

datatype(?o) = xsd:string)} GROUP BY ?Person HAVING (COUNT(*)>=1) }

FILTER (?Person_c0 = ?Person_c1)...

...11121314151617181920...

...{ { { SELECT ?Person (COUNT(*) AS ?Person_c2) {

?Person foaf:knows ?o .} GROUP BY ?Person }

{ SELECT ?Person (COUNT(*) AS ?Person_c3) {?Person foaf:knows ?o .FILTER ((isIRI(?o) || isBlank(?o)))

} GROUP BY ?Person HAVING (COUNT(*) >= 1) }FILTER (?Person_c2 = ?Person_c3)

}...

...21222324252627282930...

...UNION {

SELECT ?Person {OPTIONAL { ?Person foaf:knows ?o }FILTER (!bound(?o))

}}

}}

...3132333435363738

12345

Page 6: Towards an RDF Validation Language based on Regular Expression Derivatives

Regular Shape Expressions (RSEs)

Simplified version of Shape ExpressionsBased on Regular Expressions

Sets of triples instead of list of characters

Interleave instead of concatenation

Abstract syntax

Page 7: Towards an RDF Validation Language based on Regular Expression Derivatives

Shape Expressions vs RSEs*

<Shape1> { foaf:age xsd:integer

, foaf:name xsd:string*}

Example1:Shape Expression RSE

* Note: We are considering a subset of Shape Expressions with Closed Shapes, and inclusive Or

<Shape2> { :a ( 1 )

, :b ( 1 2 ) *}

Example 2:

Page 8: Towards an RDF Validation Language based on Regular Expression Derivatives

Cardinalities in RSEs

Cardinalities can be defined as:

Example:

Page 9: Towards an RDF Validation Language based on Regular Expression Derivatives

Shape of a RSE:

Example

Page 10: Towards an RDF Validation Language based on Regular Expression Derivatives

Simplification rules

It is easy to show that the operators obey:

Page 11: Towards an RDF Validation Language based on Regular Expression Derivatives

Matching triples with RSEs

Page 12: Towards an RDF Validation Language based on Regular Expression Derivatives

Example matching treeRules employed

Page 13: Towards an RDF Validation Language based on Regular Expression Derivatives

Derivatives of RSEs

Brzozowski's algorithm (1964) developed for Regular Expressions

We adapted that algorithm to RSEs

Calculates the derivative of a RSE with respect to a triple t:

Definition:

Page 14: Towards an RDF Validation Language based on Regular Expression Derivatives

Calculating the derivative Definitions

Page 15: Towards an RDF Validation Language based on Regular Expression Derivatives

Matching using derivatives

Auxiliary function that returns true if a RSE matches the empty graph

The matching relation can be expressed as:

Page 16: Towards an RDF Validation Language based on Regular Expression Derivatives

Example trace:

Page 17: Towards an RDF Validation Language based on Regular Expression Derivatives

Regular Shape Expression Schemas

Given a set of labels, a RSE schema is a function

where we extend RSEs to admit label references

Example 1:

Example 2:

<Person> {foaf:age xsd:integer

, foaf:knows @<Person>*}

Corresponds to:

Page 18: Towards an RDF Validation Language based on Regular Expression Derivatives

From matching to typing

We extend previous definitions to include the notion of typing

A typing associates a label to a node in a context

Definitions on typings

The matching algorithm returns the typing in the context:

Page 19: Towards an RDF Validation Language based on Regular Expression Derivatives

Matching RSEs Schemas

We define the matching of a RSE e with a set of triples as a partialfunction that returns a typing.

The function takes a typing context as argument

and we extend previous axiomatic definitions as...

Page 20: Towards an RDF Validation Language based on Regular Expression Derivatives

Axiomatic definitions adapted RSE Schemas

Page 21: Towards an RDF Validation Language based on Regular Expression Derivatives

Derivative of a RSE in a typing contextWe adapt previous definitions to typing contexts

where

Page 22: Towards an RDF Validation Language based on Regular Expression Derivatives

Example:

Page 23: Towards an RDF Validation Language based on Regular Expression Derivatives

Implementations

The algorithm has been implemented in Scala

Available at: http://labra.github.io/shexcala

We have also implemented a simplified prototype following the paperdefinitions in Haskell

Available at: http://labra.github.io/Haws

An online version is also available at: http://rdfshape.weso.es

Page 24: Towards an RDF Validation Language based on Regular Expression Derivatives

First experimental results

Comparison between derivatives (deriv) and backtracking (back)

Page 25: Towards an RDF Validation Language based on Regular Expression Derivatives

Conclusions & Future work

Declarative algorithm to match Regular Shape ExpressionsBased on equational reasoning

Theoretical complexity is unaffectedHowever, the derivatives algorithm behaves better than backtracking in practice

Future work:Prove the correctness of the algorithm

Experimental results

Align this work with current RDF Data Shapes development

Page 26: Towards an RDF Validation Language based on Regular Expression Derivatives

End of Presentation

Page 27: Towards an RDF Validation Language based on Regular Expression Derivatives

SHACL vs RSEs

At this moment, SHACL is being defined by the RDF Data Shapes WG

Some differences:Open Shapes (allow remaining triples)

Arcs check that there are no other arcs with the same predicate and different values

And operator instead of interleave

Inclusive vs Exclusive-or

Semantics of all these features is under discussion

Page 28: Towards an RDF Validation Language based on Regular Expression Derivatives

Example of derivatives that don't match