Top Banner
© 2007 OpenLink Software, All rights reserved OpenLink Virtuoso – Linked Data Deploying Linked Data
38

© 2007 OpenLink Software, All rights reserved OpenLink Virtuoso – Linked Data Deploying Linked Data.

Mar 27, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: © 2007 OpenLink Software, All rights reserved OpenLink Virtuoso – Linked Data Deploying Linked Data.

© 2007 OpenLink Software, All rights reserved

OpenLink Virtuoso – Linked Data

Deploying Linked Data

Page 2: © 2007 OpenLink Software, All rights reserved OpenLink Virtuoso – Linked Data Deploying Linked Data.

© 2007 OpenLink Software, All rights reserved

Linked Data

Term coined by Tim Berners-Lee Describes recommended best practice for exposing & connecting data

on the Semantic Web Use the RDF data model Identify real or abstract things (resources) in your ‘universe of

discourse’ (Data Spaces), using URIs as unique IDs Make URIs accessible via HTTP so people can discover and

explore these Data Spaces Allow these URIs to be dereferenced and return information Include links to provide ‘discovery paths’ to entities in other Data

Spaces

Page 3: © 2007 OpenLink Software, All rights reserved OpenLink Virtuoso – Linked Data Deploying Linked Data.

© 2007 OpenLink Software, All rights reserved

Deployment Challenges

Semantic ‘Data Web’ vs Traditional ‘Document Web’ These are two dimensions of the Web separated by a

common element – the URI Document Web

URIs always point to physical resources Data Web

URIs point to physical or abstract resources URIs for the Document and Data Webs must be interpreted

differently

Page 4: © 2007 OpenLink Software, All rights reserved OpenLink Virtuoso – Linked Data Deploying Linked Data.

© 2007 OpenLink Software, All rights reserved

Web Resources

What do we really mean by the term ‘resource’? The ‘Traditional’ and Semantic Webs require subtly different

interpretations

Page 5: © 2007 OpenLink Software, All rights reserved OpenLink Virtuoso – Linked Data Deploying Linked Data.

© 2007 OpenLink Software, All rights reserved

Document Web Resources

In the traditional Document Web: All resources are document-orientated URI dereferencing returns a document Rendered representation is nearly always a document No real distinction between a resource and its

representation Such resources have been referred to as ‘information

resources’ ‘Document resource’ is arguably a preferable term

Page 6: © 2007 OpenLink Software, All rights reserved OpenLink Virtuoso – Linked Data Deploying Linked Data.

© 2007 OpenLink Software, All rights reserved

Semantic Web Resources

In the Semantic Web: A URI need not identify a document-type resource The identity of a resource is distinct from its representation

The resource may have several possible representations The most desirable representation may change,

depending on the consumer (human or software-agent) Such resources are sometimes referred to as ‘non-

information resources’ ‘Data resource’ is a preferable term

Page 7: © 2007 OpenLink Software, All rights reserved OpenLink Virtuoso – Linked Data Deploying Linked Data.

© 2007 OpenLink Software, All rights reserved

Access vs Reference

The Semantic and Document Webs interpret the term ‘resource’ differently

A corollary of this difference in interpretation is: The Semantic and Document Webs interpret URIs

differently Document Web: assumes that the resource a URI refers

to is the same as the thing accessed (dereferenced) Semantic Web: the resource a URI refers to is often not

the same as the thing accessed – access returns a description, not the entity itself (e.g. the entity may be Paris)

Page 8: © 2007 OpenLink Software, All rights reserved OpenLink Virtuoso – Linked Data Deploying Linked Data.

© 2007 OpenLink Software, All rights reserved

Access vs Reference – Another View

Paraphrasing Pat Hayes’ paper “In Defense of Ambiguity” Names (URIs) are used to both refer to (reference) and

access things Access should be unambiguous

A name (URI) should provide an unambiguous access path

Reference to abstract (physically inaccessible) entities is inherently ambiguous Referring to an abstract entity relies on describing the

entity As there are many possible descriptions (facets),

reference is ambiguous

Page 9: © 2007 OpenLink Software, All rights reserved OpenLink Virtuoso – Linked Data Deploying Linked Data.

© 2007 OpenLink Software, All rights reserved

Deployment Challenges

We’ve established that the Semantic Web and Linked Data require:

Data access with unambiguous naming Data (de)reference with ambiguous association

Or put another way, we need mechanisms for an HTTP server to:

Answer the question “Does this URI identify a (physical) document resource or a (RDF) data resource?”

Provide alternative representations of a resource

Page 10: © 2007 OpenLink Software, All rights reserved OpenLink Virtuoso – Linked Data Deploying Linked Data.

© 2007 OpenLink Software, All rights reserved

Deployment Challenge Resolution

Two solutions proposed by the SemWeb Community: Distinguish resource type through URL formats

‘Hash’ vs ‘slash’ URLs Content negotiation with URL rewriting

Page 11: © 2007 OpenLink Software, All rights reserved OpenLink Virtuoso – Linked Data Deploying Linked Data.

© 2007 OpenLink Software, All rights reserved

‘Hash’ vs ‘Slash’ URLs

A solution using the syntax of the URL to differentiate ‘abstract’ resources from ‘information’ resources

Slash URIs Don’t contain a fragment identifier (#) Identify document resources in traditional Web E.g. http://demo.openlinksw.com/Northwind/Customer/ALFKI

Identifies a physical (X)HTML document Hash URIs

Contain a fragment identifier Identify data resources (entities) in Semantic Web E.g. http://demo.openlinksw.com/Northwind/Customer/ALFKI#this

Identifies the entity ALFKI, distinct from its representation

Page 12: © 2007 OpenLink Software, All rights reserved OpenLink Virtuoso – Linked Data Deploying Linked Data.

© 2007 OpenLink Software, All rights reserved

Content Negotiation

Mechanism defined in HTTP specification Makes it possible to serve different versions of a document

(or, more generally, a resource) at the same URL Software agents can choose which version they want.

HTML Web browsers prefer HTML/XHTML Semantic Web browsers prefer RDF/XML

Page 13: © 2007 OpenLink Software, All rights reserved OpenLink Virtuoso – Linked Data Deploying Linked Data.

© 2007 OpenLink Software, All rights reserved

Content Negotiation - Example

HTTP Request:

HTML browser requests a HTML/XHTML document in English or French

GET /whitepapers/data_mngmnt HTTP/1.1

Host: www.openlinksw.com

Accept: text/html, application/xhtml+xml

Accept-Language: en, fr

Accept header indicates preferred MIME types RDF browser might instead stipulate a MIME type of

application/rdf+xml or application/rdf+n3

Page 14: © 2007 OpenLink Software, All rights reserved OpenLink Virtuoso – Linked Data Deploying Linked Data.

© 2007 OpenLink Software, All rights reserved

Content Negotiation - Example

HTTP Response:

Server redirects to a URL where the appropriate version can be found

HTTP/1.1 302 Found

Location: http://www.openlinksw.com/whitepapers/data_mngmnt.en.html

Redirect is indicated by HTTP status code 302 (Found) Client then sends another HTTP request to the new URL HTTP defines several 3xx status codes for redirection

Page 15: © 2007 OpenLink Software, All rights reserved OpenLink Virtuoso – Linked Data Deploying Linked Data.

© 2007 OpenLink Software, All rights reserved

HttpRange-14 Recommendations

W3C TAG guidelines for indicating resource type through HTTP response code (aka the HttpRange-14 issue)

4xx or 5xx(error)

303(see other)

200(success)

HTTP Response Code

Nothing

A URI

A representation

Material Returned

The specified resource or representation format does not exist.

The resource may be an information or non-information resource. The client is being redirected to an associated representation of the resource in the desired format. The URI of the associated resource has been returned.

Requested resource is an information resource.A representation has been returned.

Inference

Page 16: © 2007 OpenLink Software, All rights reserved OpenLink Virtuoso – Linked Data Deploying Linked Data.

© 2007 OpenLink Software, All rights reserved

Content Negotiation Decision Table

200 OK406 (Not available in this format) or 303 (Redirect to associated resource in requested representation format)

Entity ID(Data resource)

http://demo.openlinksw.com/Northwind/Customer/ALFKI#this

303 (Redirect to URL that DESCRIBEs the entity http://demo.openlinksw.com/Northwind/Customer/ALFKI#this in a given Data Space)

200 OKDocument resource

http://demo.openlinksw.com/Northwind/Customer/ALFKI

RDFRepresentation

(X)HTMLRepresentation

URI TypeURI

Page 17: © 2007 OpenLink Software, All rights reserved OpenLink Virtuoso – Linked Data Deploying Linked Data.

© 2007 OpenLink Software, All rights reserved

URL Rewriting

Is the act of modifying a URL prior to final processing by a Web server

Provides a means to build a URL ‘on the fly’ identifying the resource in the required representation format referred to by a 303 redirection

Ideal solution is a rules-based URL rewriting processing pipeline using regular expression or sprintf substitutions

Page 18: © 2007 OpenLink Software, All rights reserved OpenLink Virtuoso – Linked Data Deploying Linked Data.

© 2007 OpenLink Software, All rights reserved

URL Rewriting – Example Pipeline

Last(must be last in processing chain)

For 406:Vary: negotiate, accept Alternates: {“ALFKI” 0.9 {type application/rdf+xml}}

406 (Not acceptable) or 303 redirect to an associated description of the resource

(text/html) | (application/xhtml.xml)

/Northwind/Customer/([^#]*)

Normal(order irrelevant)

None303 redirect to an associated description of the resource

(text/rdf.n3) | (application/rdf.xml)

/Northwind/Customer/([^#]*)

Normal(order irrelevant)

None200 or 303 redirect to a resource with default representation

None (i.e. default)/Northwind/Customer/([^#]*)

Processing OrderHTTP Response Headers Rule

HTTP Response Code

HTTP Accept Header (Regex)

Source URI(Regex)

Page 19: © 2007 OpenLink Software, All rights reserved OpenLink Virtuoso – Linked Data Deploying Linked Data.

© 2007 OpenLink Software, All rights reserved

Deploying Linked Data Using Virtuoso

Virtuoso’s approach is to implement the generic solution outlined so far, using Content negotiation URL rewriting

Virtuoso includes a Rules-based URL Rewriter Can be used to inject Semantic Web data into the

Document Web

Page 20: © 2007 OpenLink Software, All rights reserved OpenLink Virtuoso – Linked Data Deploying Linked Data.

© 2007 OpenLink Software, All rights reserved

URL Rewriting Example – The Aim

URI dereferenced by RDF browser client

<http://demo.openlinksw.com/Northwind/Customer/ALFKI> or<http://demo.openlinksw.com/Northwind/Customer/ALFKI#this>

becomes after rewriting (omitting URL encoding)

/sparql?query =CONSTRUCT{ <http://demo.openlinksw.com/Northwind/Customer/ALFKI#this> ?p ?o }FROM <http://demo.openlinksw.com/Northwind/>WHERE { <http://demo.openlinksw.com/Northwind/Customer/ALFKI#this> ?p ?o }

Page 21: © 2007 OpenLink Software, All rights reserved OpenLink Virtuoso – Linked Data Deploying Linked Data.

© 2007 OpenLink Software, All rights reserved

URL Rewriting for RDF Browser

Page 22: © 2007 OpenLink Software, All rights reserved OpenLink Virtuoso – Linked Data Deploying Linked Data.

© 2007 OpenLink Software, All rights reserved

URL Rewriting for iSparql

iSparql Query Buildere.g. Browsing RDF View: <http://demo.openlinksw.com/Northwind>

Dereferencing: <http://demo.openlinksw.com/Northwind/Customer/ALFKI#this> or<http://demo.openlinksw.com/Northwind/Customer/ALFKI>

UI supports two commands for dereferencing a URI: ‘Explore’ (i.e. Get all links to & from)

SELECT ?property ?hasValue ?isValueOf WHERE {{ <http://demo.openlinksw.com/Northwind/Customer/ALFKI#this>

?property ?hasValue } UNION { ?isValueOf ?property <http://demo.openlinksw.com/Northwind/Customer/ALFKI#this> }}

‘Get Dataset’ (i.e. Treat URI as a subgraph) SELECT * FROM

<http://demo.openlinksw.com/Northwind/Customer/ALFKI#this>WHERE { ?s ?p ?o }

Page 23: © 2007 OpenLink Software, All rights reserved OpenLink Virtuoso – Linked Data Deploying Linked Data.

© 2007 OpenLink Software, All rights reserved

URL Rewriting for iSparql: Issues

‘Get Dataset’ Option – Issues with URI being dereferenced:<http://demo.openlinksw.com/Northwind/Customer/ALFKI#this>

Assumes URI is a named graph – It isn’t! It’s a unique node ID (object ID / entity instance ID) The only graph defined by our RDF View is:

<http://demo.openlinksw.com/Northwind> It’s not directly dereferenceable

The cure ? Construct a subgraph using URL rewriting !

Page 24: © 2007 OpenLink Software, All rights reserved OpenLink Virtuoso – Linked Data Deploying Linked Data.

© 2007 OpenLink Software, All rights reserved

Northwind URL Rewriting: The Aim

Aim of URL rewriting for the Northwind RDF view:

Create a rule for RDF browsers which will map an IRI

<http://demo.openlinksw.com/Northwind/Customer/something>

to a SPARQL query

CONSTRUCT <iri> ?p ?o FROM <http://demo.openlinksw.com/Northwind/>

WHERE { <iri> ?p ?o }

and rewrite the request as /sparql?query=CONSTRUCT ...

Page 25: © 2007 OpenLink Software, All rights reserved OpenLink Virtuoso – Linked Data Deploying Linked Data.

© 2007 OpenLink Software, All rights reserved

Virtuoso - URL Rewriter Key Elements

Rewriting Rule Describes how to parse a ‘nice’ URL and compose the

actual ‘long’ URL of the resource to be returned Two types: sprintf-based and regex-based

Rewriting Rule List Named, ordered list of rewriting rules or rule lists Tried from top to bottom, first matching rule is applied

Conductor UI for rewriting rule configuration Configuration API – alternative to Conductor UI, for scripts

Functions for creating, dropping, enumerating rules & rule lists

Page 26: © 2007 OpenLink Software, All rights reserved OpenLink Virtuoso – Linked Data Deploying Linked Data.

© 2007 OpenLink Software, All rights reserved

Conductor UI for URL Rewriter

Page 27: © 2007 OpenLink Software, All rights reserved OpenLink Virtuoso – Linked Data Deploying Linked Data.

© 2007 OpenLink Software, All rights reserved

URL Rewriter API: Enabling Rewriting

Enabled through vhost_define( ) function vhost_define( ) defines a virtual host or virtual path opts parameter is a vector of field-value pairs Field url_rewrite controls / enables URL rewriting Field value is the IRI of the rule list to apply

e.g. VHOST_DEFINE (lpath=>'/Northwind, ppath=>'/DAV/Northwind/',vhost=>‘demo.openlinksw.com', lhost=>'192.168.11.2:80', is_dav=>1,vsp_user=>'dba', is_brws=>0, opts=>vector ('url_rewrite',

'oplweb_rule_list1'));

Page 28: © 2007 OpenLink Software, All rights reserved OpenLink Virtuoso – Linked Data Deploying Linked Data.

© 2007 OpenLink Software, All rights reserved

URL Rewriter API: Summary

Functions in DB.DBA schema: URLREWRITE_CREATE_SPRINTF_RULE URLREWRITE_CREATE_REGEX_RULE URLREWRITE_CREATE_RULELIST URLREWRITE_DROP_RULE URLREWRITE_DROP_RULELIST URLREWRITE_ENUMERATE_RULES URLREWRITE_ENUMERATE_RULELISTS

Page 29: © 2007 OpenLink Software, All rights reserved OpenLink Virtuoso – Linked Data Deploying Linked Data.

© 2007 OpenLink Software, All rights reserved

‘Nice’ URLs vs ‘Long’ URLs

Rewriter developed with broader objectives than Linked Data – consequently influenced terminology

Rewriter takes a ‘nice’ URL and rewrites it as a ‘long’ URL ‘Nice’ URL

Free from parameters, typically short ‘Long’ URL

Typically contains query string with named parameters Often ignored by web crawlers (viewed as highly

dynamic) => low page ranking

Page 30: © 2007 OpenLink Software, All rights reserved OpenLink Virtuoso – Linked Data Deploying Linked Data.

© 2007 OpenLink Software, All rights reserved

Sprintf Rules vs Regex Rules

For ‘nice’ to ‘long’ URL conversion Functionally equivalent Only difference is syntax of match pattern definition

For ‘long’ to ‘nice’ URL conversion Only works for sprintf-based rules Regex-based rules are unidirectional

Page 31: © 2007 OpenLink Software, All rights reserved OpenLink Virtuoso – Linked Data Deploying Linked Data.

© 2007 OpenLink Software, All rights reserved

URLREWRITE_CREATE_REGEX_RULE

URLREWRITE_CREATE_REGEX_RULE (rule_iri, allow_update, nice_match, nice_params, nice_min_params, target_compose, target_params, target_expn := null, accept_pattern := null, do_not_continue := 0, http_redirect_code := null ) ;

rule_iri: rule’s name / identifier

nice_match: regex to parse URL into a vector of ‘occurrences’

nice_params: vector of names of the parsed parameters.Length of vector equals # of ‘(…)’ specifiers in the regex

target_compose: ‘compose’ regex for the destination URL

target_params: vector of names of parameters to pass to the ‘compose’ expression as $1, $2 etc

target_expn: optional SQL text to execute instead of a regex compose

accept_pattern: regex expression to match the HTTP Accept header

do_not_continue: on a match, try / don’t try next rule in rule list

http_redirect_code: null, 301, 302 or 303. 30x => HTTP redirect

Page 32: © 2007 OpenLink Software, All rights reserved OpenLink Virtuoso – Linked Data Deploying Linked Data.

© 2007 OpenLink Software, All rights reserved

Rewriting Process

If current virtual directory has ‘url_write’ option set, server traverses any associated rule list recursively.

For each rule in rule list: Input for rule is normalised URL from first ‘/’ after host:port If rule’s regex matches, result is a vector of values Names & values of parameters in any query string or the

request body are decoded Destination URL is composed

Page 33: © 2007 OpenLink Software, All rights reserved OpenLink Virtuoso – Linked Data Deploying Linked Data.

© 2007 OpenLink Software, All rights reserved

Destination URL - Parameter Handling

Value of each parameter is taken from (in order of priority):

Value of a parameter in the match result Value of a named parameter in the input query string If POST request, value of a named parameter in request

body

If parameter value cannot be derived from above sources, next rule is applied

Page 34: © 2007 OpenLink Software, All rights reserved OpenLink Virtuoso – Linked Data Deploying Linked Data.

© 2007 OpenLink Software, All rights reserved

URL Rewriter API – Northwind Example

Rewriting rule: DB.DBA.URLREWRITE_CREATE_REGEX_RULE ('oplweb_rule1‘, 1, '([^#]*)‘, vector('path'), 1,'/sparql?query=CONSTRUCT+{+%%3Chttp%%3A//demo.openlinksw.com%U%

%23this%%3E+%%3Fp+%%3Fo+}+FROM+%%3Chttp%%3A//demo.openlinksw.com/Northwind/%%3E+WHERE+{+%%3Chttp%%3A//demo.openlinksw.com%U%%23this%%3E+%%3Fp+%%3Fo+}&format=%U’,

vector('path', 'path', '*accept*'),null, '(text/rdf.n3)|(application/rdf.xml)', 0, 303);

In effect (omitting URL encoding):/sparql?query = CONSTRUCT { %U ?p ?o } FROM

<http://demo.openlinksw.com/Northwind/> WHERE { %U ?p ?o }

where %U is a placeholder for the original URI

Page 35: © 2007 OpenLink Software, All rights reserved OpenLink Virtuoso – Linked Data Deploying Linked Data.

© 2007 OpenLink Software, All rights reserved

URL Rewriter API – Northwind Example

Arguments in previous rule defined by URLREWRITE_CREATE_REGEX_RULE: nice_match arg: ([^#]*)

regex matches input IRI up to fragment delimiter

nice_params arg: vector('path') ‘path’ is name of first match group in nice_match regex

accept_pattern arg: (text/rdf.n3)|(application/rdf.xml) regex to match HTTP Accept header

target_params arg: vector('path', 'path', '*accept*') names of params whose values will replace %U placeholders in the target

URL pattern *accept* passes matched part of Accept header

for substitution into &format=%U portion of query stringe.g. application/rdf.xml

Page 36: © 2007 OpenLink Software, All rights reserved OpenLink Virtuoso – Linked Data Deploying Linked Data.

© 2007 OpenLink Software, All rights reserved

URL Rewriter API – Northwind Example

Enabling Rewriting:

DB.DBA.URLREWRITE_CREATE_RULELIST ('oplweb_rule_list1',1,vector ('oplweb_rule1'));

-- ensure a Virtual Directory /oplweb existsVHOST_REMOVE (lpath=>'/Northwind', vhost=>‘demo.openlinksw.com',lhost=>'192.168.11.2:80');VHOST_DEFINE (lpath=>'/Northwind', ppath=>'/DAV/Northwind/',vhost=>‘demo.openlinksw.com', lhost=>'192.168.11.2:80', is_dav=>1,vsp_user=>'dba', is_brws=>0, opts=>vector ('url_rewrite',

'oplweb_rule_list1'));

Page 37: © 2007 OpenLink Software, All rights reserved OpenLink Virtuoso – Linked Data Deploying Linked Data.

© 2007 OpenLink Software, All rights reserved

URL Rewriter - Verification with curl

curl utility provides a useful tool for verifying HTTP server responses and rewriting rules

$ curl -I -H "Accept: application/rdf+xml"http://demo.openlinksw.com/Northwind/Customer/ALFKIHTTP/1.1 303 See OtherServer: Virtuoso/05.00.3016 (Solaris) x86_64-sun-solaris2.10-64 PHP5Connection: closeContent-Type: text/html; charset=ISO-8859-1Date: Tue, 14 Aug 2007 13:30:22 GMTAccept-Ranges: bytesLocation:/sparql?query=CONSTRUCT+{+%3Chttp%3A//demo.openlinksw.com/Northwind/Customer/ALFKI%23this%3E+%3Fp+%3Fo+}+FROM+%3Chttp%3A//demo.openlinksw.com/Northwind%3E+WHERE+{+%3Chttp%3A//demo.openlinksw.com/Northwind/Customer/ALFKI%23this%3E+%3Fp+%3Fo+}&format=application/rdf%2BxmlContent-Length: 0

Page 38: © 2007 OpenLink Software, All rights reserved OpenLink Virtuoso – Linked Data Deploying Linked Data.

© 2007 OpenLink Software, All rights reserved

URL Rewriter – URIQADefaultHost Macro

URIQADefaultHost Macro Makes rewriting rules (& RDF View definitions) more

portable Each occurrence is substituted with the value of the

DefaultHost parameter in URIQA section of virtuoso.ini configuration file

DefaultHost ::= server name. e.g. www.example.com:8890'/sparql?query=CONSTRUCT+{+%%3Chttp%%3A//^{URIQADefaultHost}^%U%

%23this%%3E+%%3Fp+%%3Fo+}+FROM+%%3Chttp%%3A//^{URIQADefaultHost}^/Northwind/%%3E+WHERE+{+%%3Chttp%%3A//^{URIQADefaultHost}^%U%%23this%%3E+%%3Fp+%%3Fo+}&format=%U'