Top Banner
Managing Semantic Graphs with Stardog 4* Pavel Klinov Senior Research Engineer Complexible Inc Based on Evren Sirin’s talk “Taming Big Data Variety with Semantic Graph Databases” at Smart Data 2015
77

Managing Semantic Graphs with Stardog 4

Apr 08, 2017

Download

Technology

Pavel Klinov
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Managing Semantic Graphs with Stardog 4

Managing Semantic Graphs with Stardog 4*

Pavel Klinov

Senior Research Engineer Complexible Inc

Based on Evren Sirin’s talk “Taming Big Data Variety with Semantic Graph Databases” at Smart Data 2015

Page 2: Managing Semantic Graphs with Stardog 4

Overview

Graphs, semantic graphs, and data variety

Semantic graphs and data integration

• RDF as unified data model

• Virtual graphs

A little on Stardog (RDF database)

Page 3: Managing Semantic Graphs with Stardog 4

About Complexible

Leading semtech provider since 2006 (aka Clark & Parsia)

• software (Pellet, Stardog)

• W3C participation

Released Stardog 1.0 in 2012 (current version 4.0.1)

Raising Round A

http://complexible.com

Page 4: Managing Semantic Graphs with Stardog 4
Page 5: Managing Semantic Graphs with Stardog 4

Big Data VsVolume

Velocity

Variety Veracity Volatility Value

Page 6: Managing Semantic Graphs with Stardog 4

Data variety is the real challenge

Based on Paradigm4 survey of more than 100 data scientists

http://www.paradigm4.com/infographic2014/

Page 7: Managing Semantic Graphs with Stardog 4

Data Variety

Syntax: formats

Structure: schemas

https://www.flickr.com/photos/designmilk/8552219138

Page 8: Managing Semantic Graphs with Stardog 4

In complex enterprises with lots of data variety, most

analytic challenges can be reduced to data integration

Page 9: Managing Semantic Graphs with Stardog 4

Data integration spaceIntegrated data

Integration effort

Data lakes

Data warehouses

Page 10: Managing Semantic Graphs with Stardog 4

Data integration spaceIntegrated data

Integration effort

Data lakes

Data warehouses

Sweet spot

Page 11: Managing Semantic Graphs with Stardog 4

Data integration challenge

RDB RDB RDBData lakes:

How to query this as a single integrated data source?

Page 12: Managing Semantic Graphs with Stardog 4

Data integration challenge

RDB RDB RDBData lakes:

How to query this as a single integrated data source?

Unified Data Model

Page 13: Managing Semantic Graphs with Stardog 4

Unified Data Model

Global coherent view over heterogenous data

Page 14: Managing Semantic Graphs with Stardog 4

Unified Data Model

Global coherent view over heterogenous data

flexible and extensible

Page 15: Managing Semantic Graphs with Stardog 4

Unified Data Model

Global coherent view over heterogenous data

flexible and extensible

at the right level of abstraction

Page 16: Managing Semantic Graphs with Stardog 4

Unified Data Model

Global coherent view over heterogenous data

flexible and extensible

at the right level of abstraction

enabling automated processing and analysis

• querying

• constraint validation

• reasoning (making implicit knowledge explicit)

Page 17: Managing Semantic Graphs with Stardog 4

Graphs are everywhere

Page 18: Managing Semantic Graphs with Stardog 4

Graphs are everywhere

Knowledge Graph

Page 19: Managing Semantic Graphs with Stardog 4

Graphs are everywhere

Knowledge Graph

Open Graph

Page 20: Managing Semantic Graphs with Stardog 4

Linked Open Data

Graphs are everywhere

Knowledge Graph

Open Graph

Page 21: Managing Semantic Graphs with Stardog 4

Why graphs?

Page 22: Managing Semantic Graphs with Stardog 4

Why graphs?

Generic data representation model

Page 23: Managing Semantic Graphs with Stardog 4

Why graphs?

Generic data representation model

Utilize connectedness of the data

Page 24: Managing Semantic Graphs with Stardog 4

Why graphs?

Generic data representation model

Utilize connectedness of the data

Flexible and extensible

Page 25: Managing Semantic Graphs with Stardog 4

Why graphs?

Generic data representation model

Utilize connectedness of the data

Flexible and extensible

Easy to compose and connect

Page 26: Managing Semantic Graphs with Stardog 4

Why graphs?

Generic data representation model

Utilize connectedness of the data

Flexible and extensible

Easy to compose and connect

Increasing number of graph database offerings

(Neo4j, Titan,…)

Page 27: Managing Semantic Graphs with Stardog 4

Generic data representation model Utilize connectedness of the data

Flexible and extensible

Easy to compose and connect

Increasing number of graph database offerings

(Neo4j, Titan,…)

Why graphs?not

No standards for syntax, semantics, or queries

Page 28: Managing Semantic Graphs with Stardog 4

RDF, briefly

RDF addresses this standardization gap for graphs

Page 29: Managing Semantic Graphs with Stardog 4

RDF, briefly

RDF addresses this standardization gap for graphs

RDF data is a set of triples (edges)

<emp:John, emp:worksFor, emp:Google>

Page 30: Managing Semantic Graphs with Stardog 4

RDF, briefly

RDF addresses this standardization gap for graphs

RDF data is a set of triples (edges)

<emp:John, emp:worksFor, emp:Google>

Originally developed to publish and link data on Web

thus Linked Data

Page 31: Managing Semantic Graphs with Stardog 4

RDF, briefly

RDF addresses this standardization gap for graphs

RDF data is a set of triples (edges)

<emp:John, emp:worksFor, emp:Google>

Originally developed to publish and link data on Web

thus Linked Data

But it can serve as general graph data model

Page 32: Managing Semantic Graphs with Stardog 4

Abstract Graph

http://www.w3.org/TR/rdf11-primer/

Page 33: Managing Semantic Graphs with Stardog 4

RDF Graph

http://www.w3.org/TR/rdf11-primer/

Page 34: Managing Semantic Graphs with Stardog 4

RDF graphs are semantic graphs

RDF graphs are graphs with meaning

Page 35: Managing Semantic Graphs with Stardog 4

RDF graphs are semantic graphs

RDF graphs are graphs with meaning

• explicit references to terms and their definitions

• definitions have formal semantics

Page 36: Managing Semantic Graphs with Stardog 4

RDF graphs are semantic graphs

RDF graphs are graphs with meaning

• explicit references to terms and their definitions

• definitions have formal semantics

Important for creating unified data models

• thus supporting data integration

Page 37: Managing Semantic Graphs with Stardog 4

RDF graphs are semantic graphs

RDF graphs are graphs with meaning

• explicit references to terms and their definitions

• definitions have formal semantics

Important for creating unified data models

• thus supporting data integration

Important for declaratively describing complex

information processing tasks

Page 38: Managing Semantic Graphs with Stardog 4

RDF serialization

http://www.w3.org/TR/rdf11-primer/

01 BASE <http://example.org/> 02 PREFIX foaf: <http://xmlns.com/foaf/0.1/> 03 PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> 04 PREFIX schema: <http://schema.org/> 05 PREFIX dcterms: <http://purl.org/dc/terms/> 06 PREFIX wd: <http://www.wikidata.org/entity/> 07 08 <bob#me> 09 a foaf:Person ; 10 foaf:knows <alice#me> ; 11 schema:birthDate "1990-07-04"^^xsd:date ;12 foaf:topic_interest wd:Q12418 . 13 14 wd:Q12418 15 dcterms:title "Mona Lisa" ; 16 dcterms:creator <http://dbpedia.org/resource/Leonardo_da_Vinci> .17 18 <http://data.europeana.eu/item/04802/243FA8618938F4117025F17A8B813C5F9AA4D619> 19 dcterms:subject wd:Q12418 .

Page 39: Managing Semantic Graphs with Stardog 4

RDF serialization

http://www.w3.org/TR/rdf11-primer/

01 BASE <http://example.org/> 02 PREFIX foaf: <http://xmlns.com/foaf/0.1/> 03 PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> 04 PREFIX schema: <http://schema.org/> 05 PREFIX dcterms: <http://purl.org/dc/terms/> 06 PREFIX wd: <http://www.wikidata.org/entity/> 07 08 <bob#me> 09 a foaf:Person ; 10 foaf:knows <alice#me> ; 11 schema:birthDate "1990-07-04"^^xsd:date ;12 foaf:topic_interest wd:Q12418 . 13 14 wd:Q12418 15 dcterms:title "Mona Lisa" ; 16 dcterms:creator <http://dbpedia.org/resource/Leonardo_da_Vinci> .17 18 <http://data.europeana.eu/item/04802/243FA8618938F4117025F17A8B813C5F9AA4D619> 19 dcterms:subject wd:Q12418 .

PREFIX foaf: <http://xmlns.com/foaf/0.1/> PREFIX schema: <http://schema.org/> PREFIX dcterms: <http://purl.org/dc/terms/> PREFIX dbpedia: <http://dbpedia.org/resource/>

SELECT ?person ?title WHERE { ?person a foaf:Person ; schema:birthDate ?birthDate ; foaf:topic_interest ?interest . ?interest dcterms:title ?title ; dcterms:creator dbpedia:Leonardo_da_Vinci . FILTER (?birthDate < "1991-01-01"^^xsd:date ) }

SPARQL query

Page 40: Managing Semantic Graphs with Stardog 4

Schema (aka ontology)

Person

Agent

Organization

rdfs:subClassOf rdfs:subClassOfworksFor hasEmployee

owl:inverseOf

rdfs:range

Page 41: Managing Semantic Graphs with Stardog 4

Schema (aka ontology)

Person

Agent

Organization

rdfs:subClassOf

Bob

rdfs:subClassOf

rdf:type

worksFor hasEmployeeowl:inverseOf

rdfs:range

Page 42: Managing Semantic Graphs with Stardog 4

Schema (aka ontology)

Person

Agent

Organization

rdfs:subClassOf

Bob

rdfs:subClassOf

rdf:type

rdf:type

worksFor hasEmployeeowl:inverseOf

rdfs:range

Page 43: Managing Semantic Graphs with Stardog 4

Schema (aka ontology)

Person

Agent

Organization

rdfs:subClassOf

Bob

rdfs:subClassOf

rdf:type

ACME

rdf:type

worksFor

worksFor hasEmployeeowl:inverseOf

rdfs:range

Page 44: Managing Semantic Graphs with Stardog 4

Schema (aka ontology)

Person

Agent

Organization

rdfs:subClassOf

Bob

rdfs:subClassOf

rdf:type

ACME

rdf:type

worksFor

hasEmployee

worksFor hasEmployeeowl:inverseOf

rdfs:range

Page 45: Managing Semantic Graphs with Stardog 4

Schema (aka ontology)

Person

Agent

Organization

rdfs:subClassOf

Bob

rdfs:subClassOf

rdf:typerdf:type

ACME

rdf:type

worksFor

hasEmployee

worksFor hasEmployeeowl:inverseOf

rdfs:range

Page 46: Managing Semantic Graphs with Stardog 4

Schema (aka ontology)

Person

Agent

Organization

rdfs:subClassOf

Bob

rdfs:subClassOf

rdf:typerdf:type

ACME

rdf:type

worksFor

hasEmployee

worksFor hasEmployeeowl:inverseOf

rdfs:range

rdf:type

Page 47: Managing Semantic Graphs with Stardog 4

Semantic models in RDF are:

Interoperable: no vendor lock-in

Actionable: run queries against it

Expressive: describe arbitrary (hyper) graphs

Flexible: adapt to changing data, new data, etc.

Reusable: by different apps in other domains

Page 48: Managing Semantic Graphs with Stardog 4

Viewing RDBs as RDF graphs

Take this:

Page 49: Managing Semantic Graphs with Stardog 4

Viewing RDBs as RDF graphs

Take this:

And view it as something like:

Page 50: Managing Semantic Graphs with Stardog 4

Viewing RDBs as RDF graphs

Take this:

And view it as something like:

http://www.w3.org/TR/rdb2rdf-ucr/

Page 51: Managing Semantic Graphs with Stardog 4

R2RML: mapping from RDB to RDF

R2RML is a standard for mapping RDB sources to RDF

Page 52: Managing Semantic Graphs with Stardog 4

R2RML: mapping from RDB to RDF

R2RML is a standard for mapping RDB sources to RDF

Mapping is conceptual, vendors can:

• extract, transform, load as RDF

• query on the fly (virtual graphs)

Page 53: Managing Semantic Graphs with Stardog 4

R2RML: mapping from RDB to RDF

R2RML is a standard for mapping RDB sources to RDF

Mapping is conceptual, vendors can:

• extract, transform, load as RDF

• query on the fly (virtual graphs)

Direct and customizable mappings

Page 54: Managing Semantic Graphs with Stardog 4

Virtual graphs in Stardog

1. Register: name, properties, mappings

2. Use in queries

Page 55: Managing Semantic Graphs with Stardog 4

Virtual graphs in Stardog

1. Register: name, properties, mappings

2. Use in queries

SELECT * { GRAPH <virtual://dept> { ?person a emp:Employee ; emp:department ?department . } ?department foaf:organization <urn:engineering> . }

Page 56: Managing Semantic Graphs with Stardog 4

Customizable mapping exampleemp:{"empno"} a emp:Employee ; emp:name "{\"ename\"}" ; emp:role emp:{ROLE} ; emp:department dept:{"deptno"} ; sm:map [ sm:query """ SELECT \"empno\", \"ename\", \"deptno\", (CASE \"job\" WHEN 'CLERK' THEN 'general-office' WHEN 'NIGHTGUARD' THEN 'security' WHEN 'ENGINEER' THEN 'engineering' END) AS ROLE FROM \"EMP\" """ ; ] .

Page 57: Managing Semantic Graphs with Stardog 4

Data integration with unified domain model and R2RML

Page 58: Managing Semantic Graphs with Stardog 4

Reasoning with virtual graphs

Page 59: Managing Semantic Graphs with Stardog 4

Reasoning with virtual graphs

Get results which

Page 60: Managing Semantic Graphs with Stardog 4

Reasoning with virtual graphs

Get results which

• do not exist in the data lakes

Page 61: Managing Semantic Graphs with Stardog 4

Reasoning with virtual graphs

Get results which

• do not exist in the data lakes

• but follow given the domain models and mappings

Page 62: Managing Semantic Graphs with Stardog 4

Reasoning with virtual graphs

Get results which

• do not exist in the data lakes

• but follow given the domain models and mappings

Turn your data lakes into deductive databases…

Page 63: Managing Semantic Graphs with Stardog 4

Reasoning with virtual graphs

Get results which

• do not exist in the data lakes

• but follow given the domain models and mappings

Turn your data lakes into deductive databases…

… without them noticing!

Page 64: Managing Semantic Graphs with Stardog 4

Reasoning with virtual graphs: example

Author ArticleJohn http://nature.com/123

Publisher NameSpringer http://springer.com/LCNS

Article database Publisher database

Page 65: Managing Semantic Graphs with Stardog 4

Reasoning with virtual graphs: example

Author ArticleJohn http://nature.com/123

Publisher NameSpringer http://springer.com/LCNS

Article database Publisher database

Goal: query for all publications across both databases

Page 66: Managing Semantic Graphs with Stardog 4

Reasoning with virtual graphs: example

Author ArticleJohn http://nature.com/123

Publisher NameSpringer http://springer.com/LCNS

Article database Publisher database

John

nature:123authors

Articlerdf:type

Springer

springer:lncs

publishes

Goal: query for all publications across both databases

Page 67: Managing Semantic Graphs with Stardog 4

Reasoning with virtual graphs: example

Author ArticleJohn http://nature.com/123

Publisher NameSpringer http://springer.com/LCNS

Article database Publisher database

John

nature:123authors

Articlerdf:type

Springer

springer:lncs

publishes

Publication

rdfs:subClassOf

rdfs:range

Goal: query for all publications across both databases

Page 68: Managing Semantic Graphs with Stardog 4

Reasoning with virtual graphs: example

Author ArticleJohn http://nature.com/123

Publisher NameSpringer http://springer.com/LCNS

Article database Publisher database

John

nature:123authors

Articlerdf:type

Springer

springer:lncs

publishes

Publication

rdfs:subClassOf

rdfs:rangerdf:type

rdf:type

Goal: query for all publications across both databases

Page 69: Managing Semantic Graphs with Stardog 4

Stardog: Semantic Graph DatabaseThe leading RDF database

Pure Java: any JVM language, full REST bindings

Client-server, embedded, middleware modes

Rich feature set

Supports property graphs (Tinkerpop)

ACID Transactions, High Availability, Hot backup/restore, JMX server monitoring, Access & Audit logging, RBAC security model, LDAP integration, SPARQL 1.1 queries, OWL 2 Reasoning, Proof trees, Integrity constraints, Full-text search, Geospatial support, Virtual graphs, Provenance support

Page 70: Managing Semantic Graphs with Stardog 4

Single-node ScalabilityScale up to 50B triples on modest hardware

Page 71: Managing Semantic Graphs with Stardog 4

Single-node ScalabilityScale up to 50B triples on modest hardware

● 32 cores, 256 GB RAM, 2 x 7200RPM HDDs, < $10K cost

Page 72: Managing Semantic Graphs with Stardog 4

Single-node ScalabilityScale up to 50B triples on modest hardware

● 32 cores, 256 GB RAM, 2 x 7200RPM HDDs, < $10K cost

Load rates up to 500k triples/second

● That’s 100M triples in 3 min, 1B in 30 min, and 20B in 20 hours

Page 73: Managing Semantic Graphs with Stardog 4

Single-node ScalabilityScale up to 50B triples on modest hardware

● 32 cores, 256 GB RAM, 2 x 7200RPM HDDs, < $10K cost

Load rates up to 500k triples/second

● That’s 100M triples in 3 min, 1B in 30 min, and 20B in 20 hours

Best-of-breed query answering performance

● Query 100M triples with a throughput of 3M+ queries/hour, 1B at

500k queries/hour, and 10B at 20k queries/hour (BSBM, 64 clients)

Page 74: Managing Semantic Graphs with Stardog 4

Stardog for Big Data (coming 2016)

Page 75: Managing Semantic Graphs with Stardog 4

Stardog for Big Data (coming 2016)

HDFS-backed storage

Horizontal partitioning of data

Page 76: Managing Semantic Graphs with Stardog 4

Stardog for Big Data (coming 2016)

HDFS-backed storage

Horizontal partitioning of data

Advanced query planner and optimization

Parallel query execution with async messaging

Page 77: Managing Semantic Graphs with Stardog 4

Questions?@klinovp, [email protected]

http://complexible.com, http://stardog.com